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Abstract —In this paper, we derive non-asymptotic achiev- 
ability and converse bounds on the random number genera¬ 
tion with/without side-information. Our bounds are efficiently 
computable in the sense that the computational complexity 
does not depend on the block length. We also characterize 
the asymptotic behaviors of the large deviation regime and the 
moderate deviation regime by using our bounds, which implies 
that our bounds are asymptotically tight in those regimes. We 
also show the second order rates of those problems, and derive 
single letter forms of the variances characterizing the second 
order rates. Further, we address the relative entropy rate and 
the modified mutual information rate for these problems. 

Index Terms —Markov Chain, Non-Asymptotic Analysis, Ran¬ 
dom Number Generation, 


I. Introduction 

A. Uniform random number generation (URNG) 

Uniform random number generation is one of important 
tasks for information theory as well as secure communication. 
When a non-uniform random number is generated subject to 
independent and identical distribution and the source distri¬ 
bution is known to Px, we can convert it to the uniform 
random number, whose optimal conversion rate is known to 
be the entropy H(Px) G). Vembu and Verdu 0 extended 
this problem to the general information source. Applying 
their result to the Markovian source, we find that the optimal 
conversion rate is the entropy rate. 

On the other hand, many researchers in information theory 
are attracted by non-asymptotic analysis recently a, a, 
a. Since all of realistic situations are non-asymptotic, it is 
strongly desired to evaluate the performance of a protocol in 
the non-asymptotic setting. In the case of uniform random 
number generation, we need to consider two issues: 

Al) How to quantitatively guarantee the security for finite 
block length n. As the criterion, we employ the 
variational distance criterion because it is universal 
composable|7|. 

A2) How to implement the extracting method efficiently. 
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Fortunately, the latter problem has been solved by employ¬ 
ing universal hash functions, which can be constructed by 
combination of Toeplitz matrix and the identity matrix 0. 
This construction has small amount of complexity and was 
implemented in a real demonstration (9), flTOl . Recently, the 
paper □H proposed a new class of hash functions, c-almost 
dual universal hash functions, and the paper El proposed 
more efficient hash functions belonging to this new class. 
Hence, it is needed to solve the first problem. 

So far, with a huge size n, quantitative evaluation of the 
security has been done only for the i.i.d. source 0, El- 
However, the source is not necessarily i.i.d. in the real world, 
and it is necessary to develop a technique to evaluate the 
security for non i.i.d. source. As a first step of this direction of 
research, we consider the Markov source in this paper. In the 
following, we explain difficulties to extend the existing results 
for the i.i.d. source to the Markov source. 

Although it is not stated explicitly in any literatures, we 
believe that there are two important criteria for non-asymptotic 
bounds: 

Bl) Computational complexity, and 

B2) Asymptotic optimality. 

Let us first consider the first criterion, i.e., the computational 
complexity. For example, Han fl3l introduced lower and upper 
bounds for the variational distance criterion by using the inf- 
spectral entropy, which are called the inf-spectral entropy 
bounds. For i.i.d. sources, these bounds can be computed 
by numerical calculation packages. However, there is no 
known method to efficiently compute these bounds for Markov 
sources. Consequently, there is no bound that is efficiently 
computable for the Markov chain so far. The first purpose 
of this paper is to derive non-asymptotic bounds that are 
efficiently computable. 

Next, let us consider the second criterion, i.e., asymptotic 
optimality. So far, three kinds of asymptotic regimes have been 
studied in the information theory: 

B2-1) The large deviation regime in which the error prob¬ 
ability e asymptotically behaves like e~ nr for some 

r > o am 

B2-2) The moderate deviation regime in which e asymp¬ 
totically behaves like e~ n r for some r > 0 and 
t G (0,1/2) OH, El, E3, and 

B2-3) The second order regime in which e is a constant 

EH, 0, 0, 0, El, El, E§- 

We shall claim that a good non-asymptotic bound should be 
asymptotically optimal in at least one of the above mentioned 
three regimes. 
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Further, when the generation rate is too large, the variational 
distance is close to 1. In this case, we cannot measure how 
far from the uniform random number the generated random 
number is. Hence, we employ the relative entropy rate (RER). 

B. Secure uniform random number generation (SURNG) 

When the initial random number X is partially leaked 
to the third party Y, to guarantee the security, we need to 
convert the random number to the uniform random number that 
has almost no correlation with the third party. When a non- 
uniform random number is generated subject to independent 
and identical distribution of the joint distribution is known to 
Px,y~ we can convert it to the uniform random number, whose 
optimal conversion rate is known to be the conditional entropy 
H(X\Y) (23, EH. 

Bennett et al. li22l . l23l and Hastad et al. (24) proposed 
to use universal hash functions for this purpose, and derived 
two universal hashing lemma, which provides an upper bound 
for leaked information based on Renyi entropy of order 2. 
The paper ITO proposed to use e-almost dual universal hash 
functions ED that includes the hash functions by flOj . Hence, 
the problem A2) has been solved by employing universal hash 
functions. 

Therefore, the remaining problem is the problem Al), i.e., 
to quantitatively guarantee the security for finite block length 
n under these hash functions. For the security criterion, we 
employ the variational distance between the true distribution 
and the ideal distribution because it satisfies the universal 
composable property J7). To achieve the rate II(X\Y) via two 
universal hashing lemma, Renner f25l attached the smoothing 
to min entropyj. which is a lower bound on the above 
conditional Renyi entropy of order That is, he proposed to 
maximize the min-entropy among the sub-distributions whose 
variational distance to the true distribution is less than a given 
threshold. Using Renner’s method, the paper fl2l derived a 
lower bound of the exponential decreasing rate. Tomamichel 
and Hayashi (26) derived an upper bound of the universal 
composable quantity of extracted key with a finite block-length 
n by combining the Renner’s method and the method of infor¬ 
mation spectrum by Han. Further, Watanabe and Hayashi l27l 
compared two approaches: the combination of the Renner’s 
method and the method of information spectra rrfi and the 
exponential bounding approach of El Further, the paper (28) 
showed that similar evaluations are possible even for e-almost 
dual universal hash functions urn 

For convenience, let us call the bound derived by the 
former approach the inf-spectral entropy bound , and the bound 
derived by the latter approach the exponential bound. It turned 
out that the exponential bound is tighter than the inf-spectral 
entropy bound when the required security level e is rather 

Bennett et al. (23) also employed a similar idea without use of the 
terminology of smoothing, and derived the conversion rate H(X\Y). 

2 In (25), Renner also showed a quantum extension of the two universal 
hashing lemma. 

3 The approach to derive a bound in 03 is almost the same as that in 
oa, but it should be noted that the security criterion in 03 is based on the 
variational distance while that in 03 is based on the purified distance. 


small. A bound that interpolate both approaches was also 
derived in 671 , which we called the hybrid bound. 

Similar to uniform random number generation, for i.i.d. 
sources, the inf-spectral entropy bound and the hybrid bound 
can be computed by numerical calculation packages. However, 
there is no known method to efficiently compute these bounds 
for Markov sources. The computational complexity of the 
exponential bound is 0(1) since the exponential bound is 
described by using the Gallager function, which is an additive 
quantity. However, this is not the case for Markov sources. 
Consequently, there is no bound that is efficiently computable 
for the Markov chain so far. Further, the first order results for 
Markov sources have not been revealed as long as the authors 
know, and they are clarified in this paper. 

Further, when the generation key rate is too large, the 
variational distance is close to 1. In this case, we cannot 
measure how far from the secure uniform random number 
the generated random number is. Hence, we employ the 
relative entropy between the generated random number and 
the ideal random number, which was introduced by Csiszar- 
Narayan (29) and is called the modified mutual information 
rate. Indeed, when we surpass axiomatic conditions, the leaked 
information measure must be this quantity (28). 

C. Main Contribution for Non-Asymptotic Analysis 

Although there are several studies for finite-length analysis 
for URNG and SURNG, they did not discuss the Markovian 
chain. Indeed, while they derived several single-shot bounds, 
these bounds cannot be directly applied to the Markovian 
chain, because the bounds obtained by such applications are 
not computable at least in the the Markovian chain. Hence, 
we need to derive new finite-length bounds for the Markovian 
chain by modifying existing single-shot bounds. For this 
purpose, we adopt the structure similar to the paper |30) , 
which addresses the source coding with Markov chain because 
this paper employs the common structure between the uniform 
random number generation and the source coding. Hence, the 
obtained results are also quite similar to those of the paper 
(30). To derive non-asymptotic achievability bounds on the 
problems, we basically use the exponential type bounds for 
the single shot setting. When there is no information leakage, 
those exponential type bounds are described by the Renyi 
entropy. Thus, we need to evaluate Renyi entropy for the 
Markov chain. For this purpose, we introduce Renyi entropy 
for transition matrices, which is defined irrespective of initial 
distributions (cf. EIli). Then, we evaluate the Renyi entropy 
for the Markov chain in terms of the Renyi entropy for the 
transition matrix. From this evaluation, we can also find that 
the Renyi entropy rate for the Markov chain coincides with the 
Renyi entropy for the transition matrix. Note that the former is 
defined as the limit and the latter is single letter characterized. 

When a part of information is leaked to the third party, 
to generate secure uniform random number, we consider two 
assumptions on transition matrices (see Assumption |T| and 
Assumption [2] of Section [TT]i. Although a computable form 
of the conditional entropy rate is not known in general. 
Assumption [I] which is less restrictive than Assumption [2] 
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enables us to derive a computable form of the conditional 
entropy rate. 

In the problems with side-information, exponential type 
bounds are described by conditional Renyi entropies. There 
are several definitions of conditional Renyi entropies (see ED, 
ll32l for extensive review), and we use the one defined in 
© and the one defined by Arimoto |33l . We shall call the 
former one the lower conditional Renyi entropy (cf. ©) and 
the latter one the upper conditional Renyi entropy (cf. ©). 
To derive non-asymptotic bounds, we need to evaluate these 
information measures for the Markov chain. For this purpose, 
under Assumption[Q we introduce the lower conditional Renyi 
entropy for transition matrices (cf. (f27l) '). Then, we evaluate the 
lower conditional Renyi entropy for the Markov chain in terms 
of its transition matrix counterpart. This evaluation gives non- 
asymptotic bounds for secure uniform random number genera¬ 
tion under Assumption ID Under more restrictive assumption, 
i.e.. Assumption |2] we also introduce the upper conditional 
Renyi entropy for a transition matrix (cf. d34l i). Then, we 
evaluate the upper Renyi entropy for the Markov chain in terms 
of its transition matrix counterpart. This evaluation gives non- 
asymptotic bounds that are tighter than those obtained under 
Assumption Q] 

We also derive converse bounds for every problem by using 
the change of measure argument developed by the authors in 
the accompanying paper on information geometry EH, ED- 
When there is no information leakage, the converse bounds 
are described by the Renyi entropy for transition matrices. 
When a part of information is leaked to the third party, we 
further introduce two-parameter conditional Renyi entropy and 
its transition matrix counterpart (cf. (IT-fl i and (l38l >). This novel 
information measure includes the lower conditional Renyi 
entropy and the upper conditional Renyi entropy as special 
cases. 

In the problem of SURNG, instead of the RER, we employ 
the modified mutual information rate (MMIR), which was 
introduced by Csiszar and Narayan |29l and whose axiomatic 
characterization was obtained in the paper l28l . When the 
uniformity is guaranteed, this quantity is given by the equiv¬ 
ocation rate introduced by Wyner Efrj . When there is no 
information leakage, our lower and upper bounds are given 
by using the Renyi entropy for the Markov chain in terms of 
its transition matrix counterpart. When there exists information 
leakage, our lower and upper bounds are given by using the 
lower conditional Renyi entropy for the Markov chain in terms 
of its transition matrix counterpart under Assumption Q] 

Here, we would like to remark on terminologies. There are 
a few ways to express exponential type bounds. In statistics 
or the large deviation theory, we usually use the cumulant 
generating function (CGF) to describe exponents. In infor¬ 
mation theory, we use the Gallager function or the Renyi 
entropies. Although these three terminologies are essentially 
the same and are related by change of variables, the CGF 
and the Gallager function are convenient for some calculations 
since they have good properties such as convexity. However, 
they are merely mathematical functions. On the other hand, 
the Renyi entropies are information measures including Shan¬ 
non’s information measures as special cases. Thus, the Renyi 


entropies are intuitively familiar in the field of information 
theory. The Renyi entropies also have an advantage that two 
types of bounds (eg. (1215b and (1218U can be expressed in a 
unified manner. For these reasons, we state our main results 
in terms of the Renyi entropies while we use the CGF and the 
Gallager function in the proofs. For readers’ convenience, the 
relation between the Renyi entropies and corresponding CGFs 
are summarized in Appendix lAl 

Overall, we summarize the contributions for non-asymptotic 
analysis in comparison to existing results as follows. 

(1) Finite-length bound: For URNG and SURNG, we 
derive finite-length bounds satisfying the conditions 
Bl) and B2) for Markovian chain. Theorems in 
Subsections fnUCl and lIV-Cl are classified to this 
type of results. All existing finite-length bounds with 
computable form are obtained with i.i.d. setting. 
Indeed, several single-shot bounds were obtained in 
a more general form. However, their computabilities 
have not been discussed in the Markovian case. At 
least, many of them, (e.g. Lemmas 16, 17, 18, 22, 
23, 25, and 28) are not given in a computable form 
in the Markovian case. 

(2) Single-shot bound: In this paper, we employ several 
existing single-shot bounds. However, many of them 
cannot be given in a useful form. These bounds 
cannot be easily calculated at least in the Markovian 
case. To apply them to the Markovian case, we 
loosen these bounds. Lemmas |2ll l24l l29l and l32l fall 
in this case. Since these bounds have a much simpler 
form than existing bounds, they might be applied 
to other cases. This discussion for the simplification 
is quite different from the case of source coding 
EOl . That is, this part has the most serious technical 
hardness compared to the paper |(30] because the 
discussion in this paper is specialized to random 
number generation. 

D. Main Contribution for Asymptotic Analysis 

Among authors’ knowledge, there is no existing study for 
the asymptotic analysis with the Markovian chain with respect 
to URNG and SURNG except for the following. When the 
general sequence of single information sources, the asymptotic 
rate of URNG is characterized by Vembu and Verdu El and 
Han m- Since the asymptotic entropy rate of Markovian 
chain is known, we can calculate the asymptotic rate of 
URNG for the Markovian chain. However, further study with 
respect to URNG and SURNG has not been discussed for 
the Markovian chain nor the general sequence of information 
sources. 

We can easily see that these non-asymptotic bounds yields 
the asymptotic optimal random number generation rate while 
the case with information leakage requires Assumption Q] For 
asymptotic analyses of the large deviation and the moderate 
deviation regimes, we derive the characterizations^] by using 
our non-asymptotic achievability and converse bounds, which 

4 For the large deviation regime, we only derive the characterizations up to 
the critical rates. 
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TABLE I 

Summary of Asymptotic Results and Non-Asymptotic Bounds to Derive Asymptotic Results 


Problem 

First Order 

Large Deviation 

Moderate Deviation 

Second Order 

RER/MMIR 

URNG 

Solved 

Solved* (U2), 0(1) 

Solved. 0(1) 

Solved, Tail 

Solved, 0(1) 

SURNG 

Solved (Ass. 1) 

Solved* (Ass. 2, U2), 
0 (1) 

Solved (Ass. 1), 
0 (1) 

Solved (Ass. 1), 
Tail 

Solved (Ass. 1), 
0 (1) 


URNG is the uniform random number generation without information leakage. SURNG is the secure uniform random number generation when a part of 
information is leaked to the third party. 


implies that our non-asymptotic bounds are tight in the large 
deviation regime and the moderate deviation regime. 

We also derive the second order rate. It is also clarified that 
the reciprocal coefficient of the moderate deviation regime and 
the variance of the second order regime coincide. Furthermore, 
a single letter form of the variance is clarifiec|f|. 

The asymptotic results and the non-asymptotic results are 
summarized in Table[I] As a part of the non-asymptotic results, 
the table focuses on the computational complexities of the non- 
asymptotic bounds. ’’Solved*” indicates that those problems 
are solved up to the critical rates. ”Ass. 1” and ”Ass. 2” 
indicate that those problems are solved under Assumption |T] 
or Assumption [2] ”U2” indicates that the converse results are 
obtained only for the worst case of the universal two hash 
family (see (11051 ) and (11781 )). ”0(1)” indicates that both the 
achievability part and the converse part of those asymptotic re¬ 
sults are derived from our non-asymptotic achievability bounds 
and converse bounds whose computational complexities are 
0(1). ’’Tail” indicates that both the achievability part and 
the converse part of those asymptotic results are derived 
from the information-spectrum type achievability bounds and 
converse bounds whose computational complexities depend on 
the computational complexities of tail probabilities. 

Exact computations of tail probabilities are difficult in 
general though it may be feasible for a simple case such 
as an i.i.d. case. One way to approximately compute tail 
probabilities is to use the Berry-Esseen theorem {39l Theorem 
16.5.1] or its variant ED. This direction of research is still 
continuing eh, ei, and an evaluation of the constant was 
done in f42l though it is not clear how much tight it is. 
If we can derive a tight Berry-Esseen type bound for the 
Markov chain, we can derive a non-asymptotic bound that is 
asymptotically tight in the second order regime. However, the 
approximation errors of Berry-Esseen type bounds converge 
only in the order of 1/ y/n, and cannot be applied when e is 
rather small. Even in the cases such that exact computations of 
tail probabilities are possible, the information-spectrum type 
bounds are looser than the exponential type bounds when e is 
rather small, and we need to use appropriate bounds depending 
on the size of e. In fact, this observation was explicitly 
clarified in El for the random number generation with side- 
information. Consequently, we believe that our exponential 
type non-asymptotic bounds are very useful. 

Further, we derive the asymptotic leaked information rate. 
When there is no information leakage, we discuss the RER, 

5 An alternative way to derive a single letter characterization of the variance 
for the Markov chain was shown in (37] Lemma 20]. It should be also noted 
that a single letter characterization can be derived by using the fundamental 
matrix El. 


which is asymptotically given by the entropy rate. When there 
exists information leakage, we discuss the MMIR, which is 
asymptotically given by the conditional entropy rate under 
Assumption [Q 

Overall, we summarize the contributions for asymptotic 
analysis in comparison to existing results as follows. 

(1) New bounds for Markovian case: For URNG and 
SURNG, we derive the optimal asymptotic perfor¬ 
mances in Subsections UIFPl UlTEl UTTT1 [19] HIUGl 
IIV-D1IIV-E1IIV-FI and IIV-GI under the four regimes, 
the large deviation regimes, the moderate deviation 
regimes, the second order regimes, and the asymp¬ 
totic relative entropy rate regime (the asymptotic 
modified mutual information rate regime) for Marko¬ 
vian chain (with suitable conditions for SURNG). 
Except for the information spectrum approach, all 
existing asymptotic analyses with these three regimes 
assume the i.i.d. source. Further, analyses with the 
information spectrum approach derived only the gen¬ 
eral formulas, which did not derive any computable 
asymptotic bounds for these three regimes for the 
Markovian chain. 

(2) New bound even for i.i.d. case: Among the above 
asymptotic results, Theorem [30] is novel even for the 
i.i.d. case. This theorem gives the converse bound for 
large deviation for SURNG. 

E. Two criteria 

In this paper, to consider a practical issue, we employ 
two criteria. In the channel coding, such a practical issue 
is discussed as a coding theory in a form separate from the 
fundamental issue. However, in the random number generation 
case, we can discuss the performance of hash functions with 
a small construction complexity in the same way as the 
fundamental issue. Such a practical issue is also the target of 
this paper. Usually, when we discuss a fundamental aspect of 
the topic of information theory, we focus only on the minimum 
leaked information among all of hash function, which is 
denoted by A (M) in this paper, whose precise definition will 
be given in Subsections 1III-A1 and IIV-AI However, when we 
take account into the complexity of construction of protocol, 
we need to restrict hash functions into hash functions with a 
small construction complexity. Hence, it is desired to minimize 
the leaked information among a class of hash functions with 
small calculation complexity for its construction. In this paper 
we focus on the family of two-universal hash functions, 
named by the two-universal hash family T because this family 
contains a hash function with a small construction complexity. 
However, this paper focuses on the worst leaked information 
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A(M) among the two-universal hash family T, which is more 
important from a practical view point than the best case due 
to the following two reasons. 

(1) Usually, the optimal hash function depends on the 
source distribution. However, it is not easy to per¬ 
fectly identify the source distribution. In such a case, 
instead of the optimal hash function, we need to 
choose a hash function that universally works well. 
If we apply a two-universal hash function, its leaked 
information is always better than the worst leaked 
information A(M). Hence, if the quantity A (M) is 
sufficiently close to the optimal case A(M), we can 
say that any two-universal hash function universally 
works well. 

(2) Although the two-universal hash family T contains 
a hash function with a small calculation complexity 
for its construction, any two-universal hash function 
does not necessarily have a small calculation com¬ 
plexity. If the quantity A (M) is sufficiently close 
to the optimal case A(M), we can take the priority 
to minimize the construction complexity among the 
two-universal hash family T over the optimization 
of the leaked information. 

In this paper, we show that the worst leaked information 
A (M) is close to the minimum leaked information A(M) in 
the moderate deviation and the second order. These results 
guarantee that any two-universal hash function has a suffi¬ 
ciently good performance. That is, they allow us to employ 
any two-universal hash function to achieve these asymptotic 
optimal performances. These results amplify our choice of 
hash function to achieve the asymptotically optimality. 


F. Organization of Paper and Notations 

As preparation, we explain information measures for single¬ 
shot setting in Subsection IH-AI Then, we address conditional 
Renyi entropies for transition matrix in Subsection IH-BI and 
discuss the relation between these information measures and 
Markov chain in Subsection 1II-C1 These information measures 
and their properties will be used in the latter sections. These 
contents were obtained in the paper ED, and their proofs 
are available in the paper f30l . However, the paper li30l did 
not address the conditional min entropy, which corresponds to 
the order parameter oo. So, in Subsections III-DI and 111-El we 
discuss the relation between the limit of the conditional Renyi 
entropy and the conditional min entropy, which are new results 
and are shown in Appendix. 

Section m addresses the uniform random number gener¬ 
ation without information leakage. The obtained upper and 
lower bounds are numerically calculated in a typical example 
in this section. Then, Section [IV] proceeds to addresses the 
secure uniform random number generation with partial infor¬ 
mation leakage. As we mentioned above, we state our main 
result in terms of the Renyi entropies, and we use the CGFs 
and the Gallager function in the proofs. In Appendix [A] the 
relation between the Renyi entropies and corresponding CGFs 
are summarized. The relation between the Renyi entropies and 


the Gallager function are explained as necessary. Proofs of 
some technical results are also shown in the rest of appendices. 

A random variable is denoted by upper case letter, and its 
realization is denoted by lower case letter. The notation V{X) 
is the set of all distribution on alphabet X. The notation 'P(X) 
is the set of all non-negative sub-normalized functions on X. 
\X\ represent the cardinality of the set X. The cumulative 
distribution function of the standard Gaussian random variable 
is denoted by 



Throughout the paper, the base of the logarithm is e. 


II. Information Measures 
In this section, we introduce information measures that will 
be used in Section [III] and Section [IV] All of lemmas and 
theorems in this section except for Lemmas [15] and [12] and 
Theorem [6] were shown in lf30l . 


A. Information Measures for Single-Shot Setting 

1) Conditional Renyi entropy relative to a general dis¬ 
tribution: In this section, we introduce conditional Renyi 
entropies for the single-shot setting. For more detailed review 
of conditional Renyi entropies, see ll32l . For a correlated 
random variable (A, Y) on X x y with probability distribution 
Pxy and a marginal distribution Qy on y, we introduce the 
conditional Renyi entropy of order 1 + 9 relative to Qy as 

Hi+8(Pxy\Qy) ’■= - q Pxy{x, y) 1+0 Qy(y)~ 9 , ( 2 ) 

*,y 

where 9 £ (—1, 0) U (0, oo). The conditional Renyi entropy of 
order 0 relative to Qy is defined by the limit with respect 
to 9. When y is singleton, it is nothing but the ordinary 
Renyi entropy, and it is denoted by Hi + g(X ) = Hi + g(P\) 
throughout the paper. 

2) Lower conditional Renyi entropy: One of important 
special cases of Hi + g(P X y\Qy ) is the case with Qy = Py. 
We shall call this special case the lower conditional Renyi 
entropy of order 1 + 9 and denotcQ 

Hf +e (X\Y) := H 1+e (P X Y\Py) (3) 

= -^og^2P X Y{x,y) 1+e P Y {yY e .{A) 

%,y 

The following property holds. 


Lemma 1 We have 


\lmHi +e {X\Y) = H{X\Y) 

u —>- 1 ) 


and 


V(X|Y) := Var 


log 


= lim ■ 

6>—s-0 


Px\y(X\Y)\ 

H(X\Y) - 

H i+e (X\Y) 


This notation was first introduce in ED. 


(5) 

( 6 ) 

( 7 ) 
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3) Upper conditional Renyi entropy: The other important 
special cases of Hi + q(Pxy\Qy) is the measure maximized 
over Qy■ We shall call this special case the upper conditional 
Renyi entropy of order 1 + 9 and denote^] 


H 


i +9(X\Y) 


max H 1+ 0(P X y\Qy ) 
Q Y ev(y) 

H 1+e {P XY \P$ +d) ) 
l + i 


■log ^P Y (y) 


J2 p x\Y( x \y) 


1+6 


T+W 


( 8 ) 

(9) 

( 10 ) 


where the expression (ITOt is the same as Arimoto’s proposal 
for the conditional Renyi entropy f33) and 


P 


(i+0) 


(y) : = 


[J2 X P.xy(x, y) 


1+61 1+e 


E x PxY(x,y') 1+6 ] 1+e 


( 11 ) 


3) The function 9Hf +g (X\Y) is a concave function of 9, 
and it is strict concave iff. V(X|Y) >0. 

4) Hf, g (X\Y) is a monotonically decreasing function 
of 9, and it is strictly monotonically decreasing iff. 
V(X|Y) > 0. 

5) The function 9H^ +g (X |Y) is a concave function of 9, 
and it is strict concave iff. V(V|Y) > 0. 

6) H\ +g (X\Y) is a monotonically decreasing function 
of 9, and it is strictly monotonically decreasing iff. 
V(V|Y) > 0. 

7) For every 9 £ (—1,0) U (0, oo), we have Hf +g (X\Y) < 
Hl + o(X\Y). 

8) For fixed 9', the function 9Hi + gi + gfX\Y) is a concave 
function of 9, and it is strict concave iff. V(X|Y) > 0. 

9) For fixed 9', Hi + g,i + gfX\Y) is a monotonically de¬ 
creasing function of 9. 

10) We have 


For this measure, we also have properties similar to Lemma 


H W (X\Y) = Hf +g (X\Y). 


□ 


Lemma 2 ((30), (45), Ml ) We have 


11) We have 


H 1+ g, 1+ g(X\Y) = Hi +g (X\Y). 


(16) 


(17) 


lira Hl +g {X\Y) = H{X\Y) 


6»-j0 


( 12 ) 


and 


lim ■ 

0->O 


H(X\Y) - H\ +0 (X\Y) 


= V(X|Y). (13) d[ 6 Hf, g (X\Y)] 


12) For every 9 £ (—1,0) U (0,oo), Hi + gs + gfX\Y) is 
maximized at 9' = 9. 

5) Functions related to lower conditional Renyi entropy: 
Since Item 5) of Lemma [3 guarantees that the function 9 f+ 


4) Properties of conditional Renyi entropies: When we 
derive converse bounds, we need to consider the case such that 
the order of the Renyi entropy and the order of conditioning 
distribution defined in (fTTb are different. For this purpose, we 
introduce two-parameter conditional Renyi entropy: 


-jg - is strictly monotone decreasing, we can define 

the inverse functions])] 9(a) = 9^(a) and a(R) = a^(R) by 

= a (18) 


d[9Hf +e (X\Y)\ 


d9 


e=e(a) 


and 


H 1+ gp+g,(X\Y) 

~ H 1+ g(P XY \Pp +e,) ) 


(14) 

(15) 


(1 + 9(a(R)))a(R) - 9(a(R))Hf +g{a{R)) (X\Y) = R, (19) 


= Py{y) 


Y p x\Y(x\y ) 


^2Px\Y{x\y) 


i+e 


for R(a) < R 

U m d[flgj- + , (A-|r)] 


< Hq(X\Y), where a = 


= n.^ * = 


1 + 0 ' 


H\ +e ,(X\Y). 


1 + 9' 

The measures defined above has the following properties: 

Lemma 3 ((30), 05), 04)) 

1) For fixed Qy, 9H\+g(P X Y\QY) is a concave function 

> 


^ ( uiiLf 

6) Functions related to upper conditional Renyi entropy: 
For 9H\ +g (X\Y), we also introduce the inverse functions 
9(a) = 0"! (a) and a(R) = af(R) by 


d9H\ +0 (X\Y) 


d9 


( 20 ) 


6=6(a) 


and 


of 9, and it is strict concave iff. Var 

0. 


loe 

10 S Pxy(X,Y) 


(1 + 9(a(R)))a(R) - 9(a(R))Hl +g(a{R)) (X\Y) = R, (21) 


2) For fixed Qy, Hi + 0 (P X y\Qy) is a monotonically de¬ 
creasing function of 9. 

7 For — 1 < 0 < 0, i(9) can be proved by using the Holder inequality, and, 
for 0 < 6, <[9) can be proved by using the reverse Holder inequality m 
Lemma 8]. 

technically, H\+q(Pxy\Qy) is always non-increasing and it is mono¬ 
tonically decreasing iff. strict concavity holds in Statement [I] Similar remarks 
are also applied for other information measures throughout the paper. 


for R(a) < R < Hq(X\Y), where a = cp := 

d[6Hl +g {X\Y)\ 
lim^oo jq . 


9 Throughout the paper, the notations 0(a) and a(R) are reused for several 
inverse functions. Although the meanings of those notations are obvious from 
the context, we occasionally put superscript f. or f to emphasize that those 
inverse functions are induced from corresponding conditional Renyi entropies. 
This definition is related to Legendre transform of the concave function 0 i— > 
6H^+q(X\Y). For its detail, see [301 . 
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B. Information Measures for Transition Matrix 

1) Conditions for transition matrices: Let 

{W{x,y\x' ,y')}(^ y): {x . y)) e(Xxy) 2 be an ergodic and 
irreducible transition matrix. The purpose of this section is 
to introduce transition matrix counterparts of those measures 
in Section III-AI For this purpose, we first need to introduce 
some assumptions on transition matrices: 


Assumption 1 (Non-Hidden (30), 04) , 051 ) We say that a 
transition matrix W is non-hidden (with respect to y) if 

^2W(x,y\x',y') = W Y {y\y') (22) 

X 

for every x 1 £ X and y. y'e3B 


Assumption 2 (Strongly Non-Hidden) We say that a transi¬ 
tion matrix W is strongly non-hidden (with respect to 30 if. 
for every 9 £ (—1, oo) and y, y' £ y, 

w Y ,g(yW) “E^fo *'!*'> y') 1+ ® ( 23 > 

X 

is well defined, i.e., the right hand side of ( 1231 is independent 
of x'. 

Assumption [j] requires (l23l to hold only for 9 = 0, and thus 
Assumption [2] implies Assumption [T] However, Assumption [2] 
is strictly stronger condition than Assumption |T| For example, 
let consider the case such that the transition matrix is a product 
form, i.e., W{x, y\x', y') = Wx(x\x')WY{y\y')- In this case. 
Assumption [j] is obviously satisfied. However, Assumption [2] 
is not satisfied in general. 

Assumption Q] means that we can decompose W ( x , y\x', y') 
as 


W{x,y\x',y') = W Y {y\y')Wx\x',Y’,Y{x\x',y',y). (24) 

Thus, Assumption [2] can be rephrased as 

T2, W x\x',Y',Y(x\x',y',y) 1+e (25) 

X 

does not depend on x l . By taking 9 sufficiently large, we 
find that the largest value of Wx\x’,v ,y{ x \ x '■> y\ y) does 
not depend on x' . By repeating this argument for the sec¬ 
ond largest value of W x \x' ,Y' ,y{ x \ x ' i y\ y) and so on, we 
eventually find that Assumption [2] is satisfied iff., for every 
x' y x! , there exists a permutation i on if such that 
W x \x' ,v ,y{x\ x ' ,y' ,y) = W x \x',Y',Y(y(x)\x' ,y',y). 

Non-trivial examples satisfying Assumption [I] and Assump¬ 
tion |2] are given in ED. 


2) Lower conditional Renyi entropy (X\Y): First, we 
introduce information measures under Assumption Q] In order 
to define a transition matrix counterpart of ([3j. let us introduce 
the following tilted matrix: 

We{x,y\x',y') := W(x,y\x' ,y') 1+e W Y (y\y')~ e . (26) 

Here, we should notice that the tilted matrix Wg is not 
normalized, i.e., is not a transition matrix. Let A g be the 
Perron-Frobenius eigenvalue and I’o.xy be its normalized 
eigenvector. Then, we define the lower conditional Renyi 
entropy for W by 


H^(X\Y) log Ae, 


1 


(27) 


where 9 £ (—1, 0) U (0, oo). For 9 = 0, we define the lower 
conditional Renyi entropy for W by 


Hf’ w (X\Y) := limEj;; (X|Y). 


(28) 


When we define the conditional entropy H W (X\Y) for W by 
using the stationary distribution Pq,xy as 

H W (X\Y) 

:= - E P o,xy(x', y ') E W(x, y\x', y') log ] , 

x',y' x,y \y\y) 

as shown below, we have 

H w {X\Y) = H{’ w (X\Y). 

Taking the derivative with respect to 9, we can show 
follows 


(29) 

as 


H^ W (X\Y) = 


d9Hg’ w (X\Y) 


d9 


e=o 


dXg 

' d9 


0=0 


d9 


^2 Wo(x,y\ x ',y')Pg t xY( x ',y') 


x,y,x',y' 

= E --fw e {x,y\ x ' ,y') 


x,y,x ,y 


0=0 


0=0 
Po,xy(x' ,y') 


- E W 0 ( x ,y\ x ',y')^Pg,xY( x ',y') 

x,y,x' ,y' 


= E p o,xy(x’ ,y')W(x,y\x' ,y')log 


=0 

W{x,y\x',y') 


x,y,x,y 

E w ( x >y\ x ’’y') p o,xY(x',y') 


W Y (y\y') 


x,y,x',y' 


?=0 


=H W (X\Y), 

where the final equation follows from the relation 

T,x,y,x',y' W { X -,y\x',y')Pe,XY( x 'iy') = 1. 

As a counterpart of ([7}, we also define 


y w (X\Y) := lim ■ 

0-s-O 


H W (X\Y)-H$(X\Y) 


(30) 


10 The reason of the name “non-hidden” is the following. In general, the 
random variable Y is subject to a hidden Markov process. However, when the 
condition (22) holds, the random variable Y is subject to a Markov process. 
Hence, we call the condition (22) non-hidden. 


Remark 1 When a transition matrix W satisfies Assumption 
El H\'2e (-X'|Y) can be written as 

Hf$(X\Y) = -±log\' 9 , (31) 
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where X' g is the Perron-Frobenius eigenvalue of 
Wy t g(y\y')Wy (y\y')~ 0 . In fact, for the left Perron-Frobenius 
eigenvector Qg of Wy : g(y\y')W Y (y\y')~ e , we have 

.v’Y^Wy^W)- 9 = A ' e Qe(y'), (32) 

x,y 

which implies that X g is the Perron-Frobenius eigenvalue of 
Wg. Consequently, we can evaluate Hf^ (X\Y) by calculat¬ 
ing the Perron-Frobenius eigenvalue of |Y| x |V| matrix instead 
of | < T||3 ; | x |Af||3^| matrix when W satisfies Assumption [2] 

3) Upper conditional Renyi entropy (X\Y): Next, we 
introduce information measures under Assumption [2] In order 
to define a transition matrix counterpart of ®, let us introduce 
the following |Y| x |V| matrix: 

Kg(y\y'):= W Yfi (y\y')^, (33) 

where Wyg is defined by (|23T >. Let Kg be the Perron-Frobenius 
eigenvalue of Kg. Then, we define the upper conditional Renyi 
entropy for W by 

Hlf g (X\Y) := - log ng , (34) 

where 9 G (—1, 0) U (0, oo). 


Lemma 4 ( POl Lemma 5]) We have 

\im.Hlf e (X\Y)=H w (X\Y) (35) 


and 


lim 

0-s-o 


H 


w 


(X\Y)-Hffi(X\Y) 


\J W (X\Y). (36) 


Now, let us introduce a transition matrix counterpart of (ITTl . 
For this purpose, we introduce the following |Y| x |V| matrix: 

Ne,e'{vW) '■= WY,g(y\y')WY,e'(y\y ') T + ?T . (37) 


Let vgfii be the Perron-Frobenius eigenvalue of Nggf. Then, 
we define the two-parameter conditional Renyi entropy by 

H™ eA+e ,(X\Y) := -ilog Vefi , + JL^H^iXlY). (38) 


Remark 2 Although we defined H^'^(X\Y) and 
H\^(X\Y) by < l27l) and (l34t respectively, we can 
alternatively define these measures in the same spirit 
as the single-shot setting by introducing a transition 
matrix counterpart of Hi + g(Pxy\Qy) as follows. 
For the marginal W Y (y\y') of W(x, y\x', y'), let 
y%v Y := {(j/, l/) : W(y\y') > 0}. For another transition 
matrix Wy on y, we define V3— in a similar manner. For 
Wy satisfying V&y C we defin^lJ 

ff WIWr (x|y) ;= _ 1 lQg X W\W Y m 


11 Although we can also define H^}^ Vy (X|y) even if y^ Y C y2- is 
not satisfied (see o for the detail), for our purpose of defining (X|y) 

and (X\Y), other cases are irrelevant. 


for 9 G (—1,0) U (0,oo), where A ^\ Wy j s the Perron- 
Frobenius eigenvalue of 

W{x,y\x',y') 1+e W Y {y\y')- e - (40) 

By using this measure, we obviously have 

(X\Y) = HY^ V (X|F). (41) 

Furthermore, under Assumption [2] the relation 

H\f e (X\Y)= max H^f Y (X\Y) (42) 

Wy 

holds m (62)], where the maximum is taken over all transi¬ 
tion matrices satisfying C ■ 


4) Properties of conditional Renyi entropies: The informa¬ 
tion measures introduced in this section have the following 
properties: 


Lemma 5 (J30l Lemma 6]) 

1) The function 9Hf^(X\Y) is a concave function of 9, 
and it is strict concave iff. V u (X\Y) > 0. 

2) H^^(X\Y) is a monotonically decreasing function 
of 9, and it is strictly monotonically decreasing iff. 
V(X|F) > 0. 

3) The function QH\’W(X\Y) is a concave function of 9, 
and it is strict concave iff. V" (X\Y) > 0. 

4) H\^(X\Y) is a monotonically decreasing function 
of 9, and it is strictly monotonically decreasing iff. 
V(AT|Y) > 0. 

5) For every 9 G (—1, 0)U(0, oo), we have (X\Y) < 
Hlf e {X\Y). 

6) For fixed 9', the function 9H)f g 1+g ,(X\Y) is a concave 
function of 9, and it is strict concave iff. Y w (X\Y) > 0. 

7) For fixed 9', H^ g 1+g ,(X\Y ) is a monotonically de¬ 
creasing function of 9. 

8) We have 

H^ 1 (X\Y) = Hf^(X\Y). (43) 

9) We have 

H^ 1+ g(X\Y) = Hi$ (X\Y). (44) 

10) For every 9 G (—1,0) U (0,oo), H^_ g 1+g ,(X\Y) is 
maximized at 9' = 9, i.e., 

dH™ e ^ +e ,{X\Y) 
d9' 

5) Functions related to H\'+ g ( X\Y ); From Statement |T] 
r . m d[8HfW(X\y)} . • „ . 

ol Lemma |5| - L -f g - is monotonically decreasing. 

Thus, we can define the inverse function 9(a) = 9^ (a) of 

d[8Hlf g (X\Y)} 

dB 


= 0. (45) 


d[9H{f e (X\Y)] 


d9 


= a 


(46) 


9=6(a 


d[SHif e (A'|F)] 
dB 


and 


for a < a < a, where a = af:= limg^oo 

- -f r d[6H^ e (X\Y)) 

a = a + := lung-,.-!--. Then, due to the definition 
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( |46} . we have the following lemma because the function 0 i—>- 
0f?7W (X\Y) is concave. 

Lemma 6 The function 6(R) defined in ((46} satisfies that 

0(R)H\'™(X\Y) - 9(R)R = sw{0H[fXX\Y) - OR). 


1 +9{R) 

Next, let 

— 


o<0 


&(a) := (1 + 9(a))a - 0(a)H$ [a) (X\Y) 


Since 


d 7?^ 

— («) = ! + %), 


(47) 


(48) 


(49) 


R(a) is a monotonic increasing function of a < a < R(a). 
Thus, we can define the inverse function a(R) = «7 (R) of 
R(a) by 

(1 + 9(a(R)))a(R) - 9(a(R))Hff g{a{R)) (X\Y) = R (50) 

for R(a) < R < H^’ w (X\Y), where H^ w (X\Y) := 
lim e^-iH\^(X\Y). 

Due to d30k when 9(a) is close to 0, we have 


6(a)H: 


f;V*i y > 

T W 1 


1, 


=8(a)H w (X|F) - -V w (X\Y)9(aY + o(8(a) 2 ). (51) 


Taking the derivative, (146} implies that 

a = H W (X\Y) - V W (X\Y)8(a) + o(9(a)). 
Hence, when R is close to H w (X\Y), we have 
R =(1 + 9(a(R)))a(R) - 9H^ g(a{R)) (X\Y) 


(52) 


=H W (X\Y) - (1 + 
+ o(9(a(R))), 


8(a(R)) 


)6(a(R))V w (X\Y) 


(53) 


i.e., 


„ R-H W (X\Y) R — H w (X\Y) 

8( a (R)) — v w (X\Y) + ° ( V W (X\Y) ^ (54) 

Further, Eqs. ( [51} and (152} imply 

- 9(a(R))a(R) + 9(a(R))H^ am (X\Y) 

=V w (X\Y) e d^L +0 (9(a(R)f) 

V W (X\Y) { R-H W (X\Y) )2 | 3{{ R-H w (X\Y) )2) ' 


V w (X\Y) 


V W (X\Y) 


(55) 


6) Functions related to H[’+ g (X\Y): For 9H^^(X |F), 
by the same reason, we can define the inverse function 9(a) = 
07(a) by 

d(9H^ 1+g{a) (X\Y)} 


d9 


9—0{a) 


d[8Hi$(X\Y )] 


dd 


= a 


(56) 


0=6 1(a) 


,. , t dWH)^(X\Y)] 

tor a < a < a, where a = a 1 := lmie^oo- i± ^ g - and 

- -t i- d[0Hfff(X\Y)\ IT , - . . 

a = a' := hmg-,.-!--. Here, the first equation in 

<[56} follows from ([45}. We also define the inverse function 

a(R) = a^(R) of 

R\a) := (1 + 8(a))a - 0(a)fl^ {o) (*|y) (57) 


by 

(1 + 9(a(R)))a(R) - 0(a(R))H^ (a(R „ (X\Y) = R (58) 

for R(a) < R < H^ w (X\Y), where H^ w (X\Y) := 
limg^-i H\y^(X\Y). Then, we can show the following 
lemma in the same way as Lemma 8 of (30}. 


Lemma 7 For R(a) < R < Hq' W (X\Y), we have 
-8R + 8Hlf g (X\Y) 


sup ■ 
6>0 


1 + 0 


= -0(a(R))a(R) + 6(a(R))H\f e{am (X\Y). (59) 
When the rate R is larger than the critical rate R cr defined by 
(d[9Hi$(X\Y)} \ 


Rcr := R 




dd 


(60) 


3=1 > 


the definition (157} of R(a ) = R^(a) yields 
-9R + 6Hl^(X\Y) 


sup 

O<0<1 


1 + 1 


= - 6(a(R))a(R ) + 8(a(R))Hlf g{a{R)) (X\Y). (61) 


Remark 3 As we can find from d29l . d30} . and Lemma Q] 
both the conditional Renyi entropies expand as 

H^ g (X\Y) = H w (X\Y)~^\/ w (X\Y)9 + o(9)(62) 

H\'™(X\Y) = H w (X\Y) - (X\Y)9 + o(9)(63) 

around 0 = 0. Thus, the difference of these measures signifi¬ 
cantly appear only when |0| is rather large. 


Remark 4 When y is singleton, H (X\Y) coincides with 
H[^(X\Y). So, they are simply called the Renyi entropy 
and denoted by H)^_ g (X) for W. 07(a), a^(R), R^(a), a.7, 
and a^ coincide with 07(a), a^(R), R^(a), a7, and tj7. They 
are simplified to 0(a), a(R), and R(a), a, and a. 


C. Information Measures for Markov Chain 

Let (X,Y) be the Markov chain induced by a transition 
matrix W and some initial distribution Rx, y,. Now, we 
show how information measures introduced in Section 1II-B1 
are related to the conditional Renyi entropy rates. First, we 
introduce the following lemma, which gives finite upper and 
lower bounds on the lower conditional Renyi entropy. 
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Lemma 8 ( |[30l Lemma 9]) Suppose that a transition matrix 
W satisfies Assumption Q] Let vg be the eigenvector of Wl 
with respect to the Perron-Frobenius eigenvalue A g such thai 1 2 I 

minvg(x,y) = 1. (64) 

x,y 

Let wg(x,y) := P Xi y 1 (x, y) 1 +e P Yl (y)~ d . Then, we have 
(n-l)9Hf$(X\Y) + 6(9) < 0Hf +e (X n \Y n ) 

< (n-l)9Hf$(X\Y) + 6(9), (65) 

where 

6(9) := -\og(vg\wg) + logma,xvg(x,y), (66) 

x,y 

5(9) := -log(vg\wg) < 0, (67) 

and (vg\wg) is defined as Y JX ,y v e( x ^y) w e(x,y). 

From Lemma [8j we have the following. 

Theorem 1 ( |[30l Theorem 1]) Suppose that a transition ma¬ 
trix W satisfies Assumption Q] For any initial distribution, we 
have 

Jhm ±Hi +e (X n \Y n ) = H$(X\Y), (68) 

lim -H(X n \Y n ) = H W (X\Y). (69) 

n—> oo Tl 

We also have the following asymptotic evaluation of the 
variance: 


Theorem 2 ((301 Theorem 2]) Suppose that the transition 
matrix W satisfies Assumption |T] For any initial distribution, 
we have 

lim -M(X n \Y n ) = M W (X\Y). (70) 

n—too Tl 

Theorem [2] is practically important since the limit of the 
variance can be described by a single letter characterized 
quantity. A method to calculate V" (X\Y) can be found in 

Ell- 

Next, we show the lemma that gives finite upper and lower 
bound on the upper conditional Renyi entropy in terms of the 
upper conditional Renyi entropy for the transition matrix. 


Lemma 9 ( ll30l Lemma 10]) Suppose that a transition ma¬ 
trix W satisfies Assumption U Let vg be the eigenvector of 
K'g with respect to the Perron-Frobenius eigenvalue Kg such 
that min y vg(y) = 1. Let wy,e be the |3A|-dimensional vector 
defined by 


wy,g(y) 


Y p A'!Yi (x, y) 1+e 


1 

1 + 0 


(71) 


Then, we have 

(■» - 1 )^- e H l+eiX\Y) + m < Y^rg H l+g(X n \y n ) 
- (n_1) T^^ (x|r) + ?(0) ’ W 


12 Since the eigenvector corresponding to the Perron-Frobenius eigenvalue 
for an irreducible non-negative matrix has always strictly positive entries(46] 
Theorem 8.4.4, p. 508], we can choose the eigenvector vq satisfying {64}. 


where 


£(9) ■= ~ \og(vg\w Y ,g) + logmax-u e (y), (73) 

y 

£(9) ■= -\og(vg\w Y ,g )■ (74) 

From Lemma [9] we have the following. 


Theorem 3 ( ll30l Theorem 3]) Suppose that a transition ma¬ 
trix W satisfies Assumption U For any initial distribution, we 
have 


lim -H\ +e (X n \Y n ) = H\f 9 (X\Y). (75) 

n—yoo Tl T T 

Finally, we show the lemma that gives finite upper and 
lower bounds on the two-parameter conditional Renyi entropy 
in terms of the two-parameter conditional Renyi entropy for 
the transition matrix. 


Lemma 10 ( Il30l Lemma 11]) Suppose that a transition ma¬ 
trix W satisfies Assumption U Let vg t gi be the eigenvector 
of Nqq, with respect to the Perron-Frobenius eigenvalue z/g.g/ 
such that mm y vg.gi(y) = 1. Let wg : g> be the |V|-dimensional 
vector defined by 


wg,g'(y) 


Y Px ^ x ’ y ^ 1+e 



(x,y) 1+e 


X 

(76) 


Then, we have 

(n-l)9H^ 1+e ,(X\Y)+a9,9') < 9H 1+e , 1+e ,(X n \Y n ) 
<(n~ l)9H^ +e ^ 1+e ,(X\Y) + ((9,9 '), (77) 

where 


C(MO :=- 

log+0,0' 

1 \wgfi' 

) + 

logma xvg t g’(y) 
y 

+ 0l(9'), 






(78) 

C (9,9') := — 

log{tJ0,e' 

’| Wg.g> 

) + 

om 

(79) 

for 9 > 0 and 






0 (9,9') := — 

log+ 0 , 0 ' 

I we,e> 

) + 

log ma xvggi(y) 
y 

+ 0{(9'), 






(80) 

C (9,9') := — 

log(ue, 0 ' 

’| Wg.g> 

) + 

01(9') 

(81) 

for 9 < 0. 






From Lemma ITOl we 

have 

the 

following. 



Theorem 4 ( lf30l Theorem 4]) Suppose that a transition ma¬ 
trix W satisfies Assumption [2] For any initial distribution, we 
have 


lim -H 1+ g tl+ei (X n \Y n ) = H™g 1+e ,(X\Y). (82) 

n —¥oo Tl 
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D. Analysis with 9 = oo: One-terminal case 

To close this section, we address the case 9 = oo, which was 
not discussed in the paper f30). Since the conditional Renyi 
entropy is monotonically decreasing for 9, the conditional 
Renyi entropy with the case 9 = oo is often called the 
conditional min entropy. To avoid difficulty, we first consider 
the case when y is singleton. 

For a single-shot random variable, we have 

lim H 1+e {X) = H ao (X) (83) 

#—>■00 

:= — logmaxPx(ai), (84) 


which is usually called min-entropy. For each x € X, let C : , 
be the set of all Hamilton cycle from x to itself. For a path 
c = ( Xi,x 2 , ■ • • ,Xk), we define the set c := {(xi,Xi + i)}^ 1 
and the number |c| to be the number of edges in cycle c, which 
is the number of elements in the set c. Then, we define the 
min-entropy for W by 


H^(X) := — log max max ( W{Xb\z 

x£.X c£Cx l 

\(x a ,xb)£c 

which is characterized as follows. 


i/M 


(85) 


Lemma 11 We have 


lim H” g (X) = HZ(X). 

u —>00 

Proof: See Appendix ICl 
We also have the following lemma. 


( 86 ) 


Lemma 12 For ( x,x '), let C XyX ' be the set of all Hamilton 
paths from x to x'. Then, let 

A := min max TT W(xb\x a ). (87) 

(x,x') CdzCx x' 

x^x' ’ (x a ,Xb)Gc 

Furthermore, let x* and c* £ C x » be such that H^(X) is 
achieved in (f85l) . Then, we have 

(n-l)H%{X) + 6 00 <H 00 (X n ) 

<(n-l)H^(X) + 6 00 , ( 88 ) 

where 

Soo :=|c* \H™ (X) - log max P Xl (x) 

X 

— logmin(A, (89) 

ioo := - log max P Xl (x) + log A. (90) 

X 

Proof: See Appendix iBl ■ 

From Lemma fl2l we can derive the following. 

Theorem 5 For any initial distribution, we have 

lim -fToopT 1 ) = H™(X). (91) 

n—> oo ti 


E. Analysis with 9 = oo: Two-terminal case 

Next, we proceed to the two-terminal case. For single-shot 
random variables X and Y, we can derive the following. 


Lemma 13 ( lf32j ) We have 

lim H\ +e {X\Y) =Hl(X\Y) (92) 

u —>00 

:= - iog ^ p y(y) max Px\Y(x[y), (93) 
y 

lim Hf +g (X\Y) =Hl{X\Y) (94) 

0 — yoo 

— -log max P x \ Y (x\y). (95) 

yesupp(Py) 


We define the lower min-entropy for W by 

H& W {X\Y) 


:= — log max max 

(x,y)eXxy egC(x, 5 ) 


n W X \X >,Y',Y{x\x',y',y)\ 

((x',y')<(x<y))£c ' 


i/ki 


(96) 

Then, similar to LemmafTTl we can show the following lemma. 


Lemma 14 We have 

lim Hf${X\Y) = H^ W {X\Y). (97) 

0 —>oo 

Next, we consider the upper min-entropy for W. When W 
satisfies Assumption [2] we note that 

T(y\y’) := max.W x \x’,Y’,Y{x\x',y',y) ( 98 ) 

X 1 

is well defined, i.e., the right hand side of ((98} is indepen¬ 
dent of x'. Let Koo be the Perron-Frobenius eigenvalue of 
WY(y\y')T{y\y'). Then, we define 

Hl w (X\Y) :=-log Koo . (99) 


Lemma 15 We have 

]im H}${X\Y) = H^ W {X\Y). (100) 

o —>00 

Proof: See Appendix [D] ■ 

Theorem 6 Suppose that a transition matrix W satisfies As¬ 
sumption [I] For any initial distribution, we have 

lim -H±(X n \Y n ) = H^ W (X\Y). (101) 

n—> oo Tl 

Suppose that a transition matrix W satisfies Assumption [2] 
For any initial distribution, we have 

lim -Hl{X n \Y n ) = Hl w {X\Y). (102) 

n—>oo Tl 

Proof: See Appendix |E] ■ 
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TABLE II 

Summary of the bounds for the uniform random number generation. 


Ach./Conv. 

Markov 

Single Shot 

A,A ,D,D 

Complexity 

Large 

Deviation 

Moderate 

Deviation 

Second 

Order 

RER 

Rate 

Achievability 

Theorem [To] 

Lemma |l9| 

A 

O(l) 

/* 

/ 



Lemma |18| 

A 

Tail 


/ 

/ 


Theorem [T3] 

Theorem [9] 

D 

0 (1) 




/ 

Converse 

Theorem [IT] 

Theorem 0 

A 

0 (1) 


/ 



Theorem [IT] 

Theorem [8] 

A 

0 (1) 

/* 

/ 



Lemma 1211 

A 

Tail 


/ 

/ 


Theorem [14] 

Proposition [7] 

D 

0 (1) 




/ 


III. Uniform Random Number Generation 

In this section, we investigate the uniform random number 
generation when there is no information leakage. Then, we 
discuss the single terminal Markov chain. In this case, as is 
explained in Remark Q] all quantities with the superscript | 
equal those with the superscript f , and these the superscripts 
are omitted. We start this section by showing the problem 
setting in Section UlI-AI Then, we review and introduce some 
single-shot bounds in Section ITlI-BI We derive non-asymptotic 
bounds for the Markov chain in Section IIII-CI Then, in Sec¬ 
tions Ullini and UmEl we show the asymptotic characterization 
for the large deviation regime and the moderate deviation 
regime by using those non-asymptotic bounds. We also derive 
the second order rate in Section IIII-FI 

The results shown in this section are summarized in Table HI1 
The checkmarks / indicate that the tight asymptotic bounds 
(large deviation, moderate deviation, and second order) can 
be obtained from those bounds. The marks ■/* indicate that 
the large deviation bound can be derived up to the critical 
rate. The computational complexity “Tail” indicates that the 
computational complexities of those bounds depend on the 
computational complexities of tail probabilities. 

In Table Ull we didn’t call the bounds of Lemmas [l9l and fl8l 
as theorems due to the following reason. In Subsection ll-Al we 
listed the requirement for the finite-length bounds. Hence, we 
give a status of Theorem only for a non-asymptotic bound with 
a computable form. However, Lemmas [19] and [18] require the 
calculation of the tail probability whose calculation complexity 
is not 0(1) at least in the Markovian case. Hence, Lemmas 
1791 and [T8] are not given the status of Theorem although they 
derive the asymptotic tight bounds. 

A. Problem Formulation 

We first present the problem formulation by the single shot 
setting. Let X be a source whose distribution is P. A random 
number generator is a function / : X —> {1 The 

approximation error is defined by 

m--=l\\Pf(x)-Pu\\i, (103) 

where U is the uniform random variable on {1 

For notational convenience, we introduce the infimum of 

approximation error under the condition that the range size 


is M: 

A(M) := inf A[/]. (104) 

When we construct a random number generator, we often 
use a two-universal hash family T and a random function 
F on T. Then, we bound the approximation error averaged 
over the random function by only using the property of two- 
universality. As explained in Subsection II-EI to take into the 
practical aspects, we introduce the worst leaked information: 

A(M) := supE[A[F]], (105) 

F 

where the supremum is taken over all two-universal hash 
families from X to {1 From the definition, we 

obviously have A(M) < A(M). When we consider n- 
fold extension, the random number generator and related 
quantities are denoted with subscript n. Instead of evaluating 
the approximation error A(M„) (or A (M n )) for given M n , 
we are also interested in evaluating 

M(n,e) := sup{M n : A(M n ) < e}, (106) 

M(?r,e) := sup{M n : A(M n ) < e} (107) 

for given 0 < e < 1. 

When the output size M is too large, A (M) and A (M) 
are close to 1. So, the criteria A (M) and A(M) do not 
work as proper security measures. In this case, to quantify the 
performance of the output random number, according to Wyner 
l36l . to discuss the imperfectness of the generated random 
number, we focus on the difference between the entropies of 
the generated random number and the ideal uniform random 
number, which is given as 

log M - H(Pf( X )) 

= logM-]T( Y p x(x)) log( Y Px ( x )) 

z xej-Hz) xef- x {z) 

=D(P nx) \\P w ), (108) 

where D(P\\Q) is the divergence between two distributions 

P and Q. When the block size is n, we call the quantity 
-^D(Pf(x)\\Pjj) the relative entropy rate. Then, we focus on 
the following quantities. 

D(M) := M D(P f{x) \\Pu) (109) 

D{M) := supE[79(P F(x) ||P F )], (110) 

F 
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where the supremum is taken over all two-universal hash 
families from X to {1,... ,M}. Due to the same reason for 
A(M), we consider the criterion D(M) in addition to the 
criterion D(M). 

B. Single Shot Bounds 

In this section, we review existing single shot bounds 
and also show novel converse bounds. For the information 
measures used below, see Remark Q] in Section |TT] which 
explains the information measures when y is singleton. Fur¬ 
thermore, we need to introduce other information measures. 
For P e V(X), let 

1 


H min {P) ■■= log 

be the min-entropy. Then, let 


max! P(x ) 


and 


H Lin(P ) : = p ^aX p) H min (P') 


Prnin (P) '■= nfax Hmin(P') 

P'&B e (P) 


be smooth min-entropies, where 


Lemma 16 (Lemma 2.1.1 of JO)) We have 


AIM) < inf 

7>0 


, 1 ) M 

g 7Am <7 1 + J7 


Lemma 17 ((25)) We have 


A (M) < inf 

0<£<1 


2 £+ ^VMe-n^iniPx) 


Lemma 18 We have 


A (M) < inf 

7>0 


Px i log 


Px(X) < 7 ! + 2 


We also have the following achievability bound. 

Lemma 19 (Theorem 1 of Jl2) ) We have 

q 

A (M) < inf -M^e~^ Hl+e(x) . 
o<e<i 2 


We also have the following converse bound, which is a 
special case of Lemma [28] ahead for the more general non¬ 
singleton case. 


Lemma 20 We have 

A (M) > 


min e. 

H s min (P)>logM 


( 120 ) 


Similar to Lemma El the bound in Lemma [20] cannot be 
directly calculated in the Markovian chain. To resolve this 
problem, we slightly loosen Lemma l20l as follows. 


Lemma 21 We have 


( 111 ) 


( 112 ) 


(113) 


A (M) > max 

7 >o 


Px < log 


1 


< 7 


1 M 


■ ( 121 ) 


Px(X) 

Proof: Fix arbitrary 7 > 0. Then, from Lemma |20] there 
exists P' £ B e (P) such that 


log 


A(M)> illPx-P'lli, 

1 

> log M. 


maxj P'{x ) 


B e (P) ~ \p'£T(X) : illP-P'lli <e|, (114) 
B £ (P) := jp'ePW : illP-P'lli < £ |, (115) 

and V(X) (V(X)) is the set of distributions (sub-distributions) 
over the set X. 

First, we have the following achievability bound. 


Then, we have 
1 


2 H P * - P 'Hi = max(P Y (5) - P'(S)) 

>Px I x : log \ < 7 1 — P' { x : log \ < 7 


( 122 ) 

(123) 

(124) 


>Px S x : log 


Px{x) 

P x (x) <1 ]~M 


x : log 


p x(x) 

1 


Px{x) 


(125) 

< 7 ] 


>Px { X : log 


1 


Px (x) 


<7}^ E Px ^y 


a:log p x(*) <7 


(116) 


By using the two-universal hash family, we can derive the 
following bound. 


=Px < log 


< 7 


1 M 


(126) 

e 7 

(127) 

(128) 


(117) 


However, the bound in Lemma E] cannot be directly cal¬ 
culated in the Markovian chain. To resolve this problem, we 
slightly loosen Lemma [171 as follows. 


Px(X) 

where (1 1 26b follows from (11231) . (11221 ) and ( 11281 ) yield (11211) . 

■ 

Although Lemmal2T1is useful for the large deviation regime 
and the moderate deviation regime, it is not useful for the 
second order regime. To resolve this problem, we loosen 
Lemma ET] as follows. 


Lemma 22 (Lemma 2.1.2 of Jl3) ) We have 


A (M) > max 

7>0 


Px < log 


Px(X) <7 [ M 


(129) 


(US) 


( 119 ) 


This fact implies that Lemma El is better than the previous 
bound given in Lemma l22l 

Furthermore, by using a property of the strong universal 
hash family introduced in 1 1 2 l . we can derive the following 
convers & 

13 The paper E) introduced the strong universal hash family as a special 
case of a two-universal hash family. Theorem 2 of ED shows that the strong 

universal hash family F satisfies E[A[.F]] > ^1 — Px(£l)- 
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Lemma 23 (Theorem 2 of [121 ) For any subset £1 <Z X such 
that \fl\ < M, we have 

S(M)>(l-H) Pxm- (130) 

Similar to Lemmas [IT] and [20] the bound in Lemma [23l 
cannot be directly calculated in the Markovian chain. To 
resolve this problem, we modify Lemma 1231 as follows. 


Lemma 24 For any 0 < v < 1, we have 

A(M)>(l-t,) 2 P x {log^-^<a(P)}, (131) 

where R = log(-MP), and a(R) is the inverse function defined 
by (gU). 

Proof: See Appendix [F] ■ 

To derive a converse bound for A (M) based on the Renyi 
entropy, we substitute the formula in Proposition [3 in Ap¬ 
pendix lAl into the bound in Lemmal2llfor a = 7 = log(M/2). 
So, we have the following. 


Theorem 7 We have 


— log A (M) 


< inf - 

s>0 g 
t»0(a) 


(1+ S )0( 


(1 + s) log ( 
log 2, 


Hi +§ (X)~ 

H i+(i+ s )e( x j) 

l _ e (8(a)-§)a-O(a)H 1+eM (X)+0H 1+s (X) 

(132) 


where a = log(M/2) and 8(a) is the inverse function defined 
in (l46b . 


Theorem 8 We have 


-logA(M) 

1 


< inf 

s>0 g 
0>g(a(R)) 


(1 + s)8^H 1+§ (X)-H 1+{1+s]§ (X) 

- (1 + s) log ^1 

_ e (e(a(R))-O)a(R)-0(a(R))H 1+gMR)) (X)+0H 1+§ (X) 


+ 2 log 2, 


(134) 


where R = log(M/2), and 9(a) and a(R) are the inverse 
functions defined in (l46t and (l50t . 

Proof: We evaluate — log A (M) by using Lemma [24] 
with v = 4. The probability Px (log p^{x) < a(P)j = 
Px {logPx(X) > —a(R)} can be evaluated by ( 1133b . Since 
(1 — v) 2 = we obtain (1 1 34b . ■ 

Finally, we address the relative entropy rate. As the direct 
part, we have the following theorem. 


Theorem 9 The relative entropy D(M) is evaluated as 

D(M) < i log(l + M 9 e- eHl +^ x) ). (135) 

6 

Proof: Lemma 10 of ED shows that any two-universal 
hash function F satisfies the relation 

E[M e e ~ eHl+e{F{x)) ] < 1 + M e e~ 8Hl+e{x \ (136) 

which implies that E[log M — H(F(X ))] < 

EflogM — H 1+ e(F(X))\ = Ei log (M e e~ dH ^ F ^) < 
\ logE (M e e~ eH ^ F(X ^) < i log(l + M e e~ eHl +^ x '>). ■ 

As the converse part, we have the following theorem. 


Proof: We evaluate — logA(M) by using Lemma 
ITTI To evaluate the probability Px (log p^x) < = 

Px {logP Y (X) > —a}, we apply Proposition [3 in Appendix 
El to the random variable log Rx (X) whose cumulant gen¬ 
erating function f(p) is —9Hi + g(X). Then, p(—a) = 6(a). 
Hence, 


Proposition 1 

D(M) > logM — H(P X ) (137) 

Proof: Inequality (1137b follows from the inequality 

H(P X ) > H(P f{x) ). ■ 


- log P x {log Px(X) > -a} 


< inf - 

_s>0 s 
6>9(a) 


(1 + 8)o(H 1+s (X)-H 1H1+a)§ (X)) 


(1 + s) log (l - e («W-9)«-9(«)ffi+«wW+®X + sW' 

(133) 


Since 1 — ^ = 4, we obtain (1 132b . ■ 

To derive a converse bound for A (M) based on the Renyi 
entropy, we substitute the formula in Proposition [3 in Ap¬ 
pendix [A] into the bound in Lemma [24] for v = 4. So, we 
have the following. 


C. Finite-Length Bounds for Markov Source 

In this subsection, we derive several finite-length bounds for 
Markovian source with a computable form. Unfortunately, it is 
not easy to evaluate how tight these bounds are only with their 
formula. Their tightness will be discussed by considering the 
asymptotic limit in the remaining subsections of this section. 
Since we assume the irreducibility for the transition matrix 
describing the Markovian chain, the following bounds hold 
with any initial distribution. 

To lower bound —logA (M n ) by the Renyi entropy of 
transition matrix, we substitute the formula for the Renyi 
entropy given in Lemma [3 into the bound in Lemma [19] we 
have the following bound. 
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Theorem 10 Let R := i log M n . Then we have 


- log A(M„) 

- 9nR + (n - l)0HW g (X) + 5(9) 

sup ----- 

o< 0 <i 1 + 0 


l°g(3/2). 

(138) 


To upper bound — log A (M n ) by the Renyi entropy of tran¬ 
sition matrix, we substitute the formula for the tail probability 
given in and Proposition Q] with a = R into the bound in 
Lemma | 2 TI with 7 = nR, we have the following bound. 


Theorem 11 Let R = ±log(M n /2). If a < R < H w (X), 
then we have 


- log A (M n ) 
1 


< inf 

s>° g 
e>e(R) 


(»-!)(! + ^[K^ X )- H ^ + s^ X ) 


+5 l 


- (1 + s) log 

_ e (n-l)mR)-S)R-e(R)H^ +HR) {X)+eH^ +s (X)]+5 2 ^ 

+ log 2, (139) 


where 9(a) is the inverse function defined in (l46l >. and 

^ = (l + s)0(0)-5((l + s)0), (140) 

62 = (9(R)-9)R + 6(9)-6(9(R)). (141) 


Proof: Theorem QT| can be shown by the same way 
as Theorem [7] with replacing the role of Proposition [3] in 
Appendix lAl by Proposition [4] ■ 

To upper bound — log A (M n ) by the Renyi entropy of tran¬ 
sition matrix, we substitute the formula for the tail probability 
given in and Proposition [4] with a = R into the bound in 
Lemma l23l we have the following bound. 


Theorem 12 Let R be such that 


Proof: See Appendix iGl ■ 

To upper bound D(e nR ) by the Renyi entropy of transition 
matrix, we substitute the formula for the Renyi entropy given 
in Lemma [ 8 ] into the bound in Theorem [9] we have the 
following bound for the relative entropy rate 4 D(e nR ). 

Theorem 13 When R - H\ v +g (X) > 0, for 9 <E [0,1], we 
have 

—~D(e nR ) <R ?-±HY +s (X) + -—(log2 — 6(0)). 
n n un 

(144) 

Proof: Theorem [9] and Lemma [ 8 ] yield (a) and ( b ), 
respectively, in the following way. 

D(e nR ) 

( < ) l] og (l + e fl("«-«i + .(x n ))) 

6 

<- log(l + e e{nR - in - 1)H ™° {x)) - m ) 
=n(R-H™ g (X)) 

+ - \ og (e ne ( H ™° (x) - R) + e eH ™^ x) ~ m ) 

0 

<n(R - HW g (X)) + l log(l + e 8H ^ x) ~ m ) 

U 

<n(R - H™ 0 (X)) + - Q \og(2e e <+°^- s -W) 

=n(R - H^ +e (X)) + i(log 2 + 0H" e (X) - 6(9)) 
=nR - (n - 1 )HY +b (X) + i(log2- 5(0)). (145) 

■ 

To lower bound D(e nR ) by the Renyi entropy of transition 
matrix, we substitute the other formula for the Renyi entropy 
given in Lemma [ 8 ] into the bound in Proposition Q] we have 
the following bound for the relative entropy rate 4 D(e nR ). 


(n - 1 )R + {(1 + 0(a(R)))a(R) - 6(9(a(R)))} 

= log(M„/2). (142) 

If R(a) < R < H W (X), then we have 
-log A (M n ) 

< inf - 

s>0 g 
9>9(a(R)) 


(n - 1)(1 + s)0 H? +S (X) - H” +( 1 +s)S (X) 


+ 5i - (1 + s) log (l - e Cl ’ n ) +2 log 2 , 

(143) 

where 9(a) and a(R) are the inverse functions defined in (146b 
and (l50l) . and 


Ci, n :=(n - 1 ) 


( 1 9(a(R )) - 9)a(R) 


- B(a(R))H™ 0 {am (X) + OHY + ~ 0 (X) 

51 :=(l + s)5(0)-5((l + s) 0 ), 

5 2 :=(0(a(R)) - 9)a(R) + 6(9) - 6(9(a(R))). 


Theorem 14 For 9 £ [0,1], we have 

-D(e nR ) >R- ^—^H™ e (X) + (146) 

n n Un 

Proof: Lemma [ 8 ] implies that 

H(X n ) < <(n- l)H™_ e (X) - (147) 

V 

Hence, using Proposition |T| we obtain (1 1 46b . ■ 


D. Large Deviation 

Taking the limit in the formulas in Theorems flOl and IT2l we 
have the following. 


Theorem 15 For R < H W (X), we have 


liminf-log A(e nR ) > sup 

ra-S-oo n 0<6<1 


-9R + 9H^_ 0 (X) 


■ ( 148 ) 


1 + 9 
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On the other hand, for R(a) < R < H W (X), we have 


lim sup — — log A( e nR ) 

n—foo 77 , 

< - 9(a(R))a(R) + 9(a(R))H^ +0{a{R)) (X) 
-6R + 0HW(X) 

= su p -m-■ 


(149) 

050) 


Due to Lemma [7] the lower bound (11481 ) and the upper 
bound (11501 ) coincide when R is not less than the critical rate 
Rd given in (l60l) . 

Proof: (11381 ) yields (11481 ). Lemma|7]guarantees (11501 ). So, 
we will prove (11491) as follows. 

We fix s > 0 and 9 > 9(a(R)). Then, (11431 ) implies that 

_lim —i l° g 4(JU„) <l±i« (ff&f 0 - ■ 

051) 


Taking the limit s —> 0 and 9 — > 8(a(R)), we have 

=; (»"£.-(*) - (1 + <)*",+(.+.)«(■*■)) + iH Z s rn 


— 6 


d6HY +0 {X) 


d9 


9=6 


+8HZe 


- 0(a(R)) 


d8H™{X) 


d8 


6 = 6 {a(R)) 


(X) (as S 0) 

+9(a(R))H^ +0{a{R)) (X) 


(as 9 —> 9{a{R))) 

= 9(a(R))a + 8(a(R))H^ +0{a{R)) (X), (152) 

where (a) follows from (l56l >. Hence. (11521 ) and (11511 ) imply 
that 


lim -- log A(M„) < 8(a(R))a + 9{a{R))H^_ 0 ^ R)) (X), 

n—¥ oo 77, ' \ \ jj 

(153) 


which implies (11491 ). ■ 

For the general class of functions, we can derive the 
following converse bound from Theorem fill 


Theorem 16 For a<R< H W (X ), we have 
lim sup — — log A(e nR ) < -8(R)R + 9(R)H^ g{R] (X). 

n—> oo Tl 

(154) 


E. Moderate Deviation 

Taking the limit with R = II 11 (X) — n~ t S in Theorem flOl 
and Theorem QT] (or Theorem fl2l). we have the following. 


Theorem 17 For arbitrary t G (0,1/2) and 5 > 0, we have 


lim — 

n—foo Tl 


1^2t l0 S A ( ( 


,nH w (X)-n 




= lim -^logAfe^W- 1 -^) 

n—f oo Tl 1 V / 

6 2 


2V W (X)' 


(155) 


Proof: We apply Theorem [TO] and Theorem[TT]to the case 
with R = H W (X) — n~ t 5, i.e., 9{a{R)) = — yw^ X ) 4" 
o(n -4 ). Eqs. ( 1541 ) and (11381 ) in Theorem [101 imply that 


- log A(M n ) 

-6nR+ {n - l)6HW 0 (X) 


> sup 
o<e<i 

+ inf 


m 


>n 


0 < 6<1 1 + ( 

S 2 


1-21 


2 V W (X) 


1 + 8 

- log(3/2) 

+ o(n l ~ 2t ). 


(156) 


We fix an arbitrary s > 0. Since 9(R) = —n 
i~t 

t 


-t 


v w ( x) + 

o(n -4 ), we can choose 9 > 9(R) such that 9 = — vw^.y) 4~ 
o(n -4 ). Then, (11391 ) implies that 


lim-log A(M n ) 

n—f oo Tl 


< lim n 

n—f oo 


2 1 


1 + S- 




= lim n 2 ‘— l a p dH !+W 


1+6 
, dH w 


d8 


6=6 (1 + S) 2V^(X)- 


(157) 


Taking the limit s —> 0, we obtain the desired argument. ■ 


F. Second Order 

By applying the central limit theorem to Lemmas [18] and 
l22l and by using Theorem |2] we have the following. 


Theorem 18 For arbitrary e G (0,1), we have 

log M(n, e) — nH w (X) 

lim -—- 

n-+ oo yjn 

= lim = 

n-¥ oo yj Tl V 

(158) 

Proof: The central limit theorem for Markovian process 
fin . {48], {49] lf35l Corollary 6.2.] guarantees that the random 
variable (—logPx«(X") — nH w {X))/y/n asymptotically 
obeys the normal distribution with the average 0 and the 
variance V” (X). Let R = -^/V w (X)$~ 1 (e). Substituting 
M = e nIiW ( X )+V™ R and 7 = nH w (X) + yjnR + ni in 
Lemma [18] we have 


lim A(e nHW W+^ R ) < e. 

(159) 

n—f oo 

Also, substituting M = e nH " ( X )+V^R and 7 
yfnR — n+ in Lemma l22l we have 

= nH w (X) + 

lim A ( e nB w W+yfiiRs > 

(160) 

n—>00 

Combining (| 159]) and (| 1 60|). we obtain (11581. 

■ 
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R R 




Fig. 1. Comparisons of the bounds for p = 0.1 and q = 0.2. The left and right graphs express the cases with n = 10000 and 1000000, respectively. The 
horizontal axis is — log 10 (£), and the vertical axis is the rate R (nats). The red dashed curve is the achievability bound in Theorem 110| The blue dotted curve 
is the converse bound in Theorem ll2l The purple thick curve is the converse bound in Theorem 1 1 ll The green normal horizontal line is the entropy H w (X). 


G. Relative Entropy Rate (RER) 

Taking the limit in Theorems [13] and [14] we have the 
following. 

Theorem 19 The relative entropy rate (RER) is asymptoti¬ 
cally calculated as 

lim -D{e nR ) = lim -D(e nR ) = [R - H W (X)}+, 

n—f oo 77 n —>-oo 77 

(161) 

where [a;] + := max(i, 0 ). 

Proof: When R > H™ +0 {X), dTTTl of Theorem O 
implies that 

lim -D(e nR ) <R H™ e {X) (162) 

n —>oo 77 ' 

for 9 £ (0,1). Since D(e nR ) > D(e nR ') for R > Rf (fl62l> 
implies that 

lim -D(e nR ) <[R~ H^ +e (X)}+ (163) 

n—f oo 77 

for 9 £ (0,1) and any R. 

Also, (1 1 46b of Theorem [J~4l implies that 

lim —D{e nR ) > R - H^JX) (164) 

n—f oo 77 

for 8 £ (0,1) and any R. Since D(e nR ) > 0, we have 

lim - D(e nR ) >[R- H^_ 9 (X)}+ (165) 

n—foo 77 

for 9 £ (0,1) and any R. Taking the limit 9 —> 0, we have 

(fT 6 B . ^ ■ 

[b] 

H. Numerical Example 

In this section, we numerically evaluate the achievability 
bound in Theorem [TO] and the converse bounds in Theorems 
m and [12] As shown in Theorem [J~5l the finite-length bounds 
in Theorems ITOl and [J~2l achieve the optimal rate in the sense of 
Large deviation when R is larger than the critical rate. Hence, 


we can expect that the converse bounds in Theorem[l2]is better 
than that in Theorem [TT] Now, we numerically demonstrate 
how the converse bounds in Theorem [12] is better than that 
in Theorem [TT] Note that the single-shot bounds for second 
order in Lemmas [18] and [22] are not given in a computable 
form with Markovian case. So, we compare the bounds given 
in Theorems [TO] [TT] and [12] 



Fig. 2. The description of the transition matrix. 


We consider a binary transition matrix W given by Fig. [2] 
i.e.. 


W = 


1 ~P Q 
P i -q 


In this case, the stationary distribution is 


H o) 

p(i) 


q 

p + q ’ 
p 

p + q' 


The entropy is 


H W (X) 


q 

p + q 


h(p ) + 


-h(q), 


(166) 


067) 

068) 


(169) 


where /;.(■) is the binary entropy function. The tilted transition 
matrix is 


W e 


(1 -p) 1+e q 1+e 

pi+s (i _ q y+8 


(170) 
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TABLE III 

Summary of the bounds for uniform random number generation with side-information. 


Ach./Conv. 

Markov 

Single Shot 

A,A ,D,D 

Complexity 

Large 

Deviation 

Moderate 

Deviation 

Second 

Order 

MMIR 

Rate 

Achievability 

Theorem |23| (Ass. 1) 

(Lemma 1271 

A 

O(l) 


/ 



Theorem 1 25 1 (Ass. 2) 

Lemma 1271 

A 

0 (1) 

/* 

/ 



Lemma |26| 

A 

Tail 


/ 

/ 


Theorem [27] (Ass. 1) 

Theorem |22| 

D 

0 (1) 




/ 

Converse 

Theorem [24] (Ass. 1) 

Theorem 1201 

A 

0 (1) 


/ 



Theorem [26l (Ass. 2) 

Theorem [27] 

A 

0 (1) 

/* 

/ 



Lemma [29] 

A 

Tail 


/ 

/ 


Theorem [28l (Ass. 1) 

Proposition [2] 

D 

0 (1) 




/ 


The Perron-Frobenius eigenvalue is 


A e = 


(1 - p) 1+e + (1 - q) 1+e 
2 

v/{(l - v) 1+d - (1 - q) 1+8 } 2 + 4p!+V +fl 


and its normalized eigenvector is 

Pe( 0) = 

A(l) = 


,i+e 


\e - (1 -p) 1+e 


Xg - (1 -p) 1+e + q 


7 1 +8 ' 


(171) 

(172) 

(173) 


The eigenvector of Wj satisfying ( I64| i is also given by 


vg{0) 

vg{l) 


q 


i+e 


min 


min 


(Xg — (1 — p) 1+e , q 1+e ) ’ 
A g - (1 -p) 1+s 
{Xg — (l — p) 1+e , q 1+e )' 


(174) 

(175) 


From these calculations, we can evaluate the bounds in The¬ 
orems [JO] [U] and O When the initial distribution is given 
as Px{ 0) = 1 and Px{ 1) = 0, for p = 0.1, q = 0.2, we 
plotted the bounds in Fig. |T| for fixed block length n = 10000 
and n = 1000000 and varying e = A(M) or A(M). The 
two bounds in Theorems QT| and [12] have similar values in 
the left of Fig. [7] However, the bound in Theorem fl2l has a 
clear advantage in the right of Fig. Q] That is, to clarify the 
advantage of Theorem [T2] we need a very huge size n and a 
very small e. Although one may consider that n = 1000000 is 
too large to realize, this size is realizable as follows. A typical 
two-universal hash family can be realized by using Toeplitz 
matrix. This kind two-universal hash family with n = 10 8 was 
realized efficiently by using a typical personal computer Go] 
Appendix B]j9). 


IV. Secure Uniform Random Number Generation 

In this section, we investigate the secure random number 
generation with partial information leakage, which is also 
known as the privacy amplification. We start this section 
by showing the problem setting in Section IIV-AI Then, we 
review and introduce some single-shot bounds in Section HV-BI 
We derive non-asymptotic bounds for the Markov chain in 
Section HV-CI Then, in Sections IlY-DI and IIV-EI we show the 


asymptotic characterization for the large deviation regime and 
the moderate deviation regime by using those non-asymptotic 
bounds. We also derive the second order rate in Section HV-FI 

The results shown in this section are summarized in Table 
m The checkmarks / indicate that the tight asymptotic 
bounds (large deviation, moderate deviation, and second order) 
can be obtained from those bounds. The marks /* indicate 
that the large deviation bound can be derived up to the 
critical rate. The computational complexity ’’Tail” indicates 
that the computational complexities of those bounds depend on 
the computational complexities of tail probabilites. It should 
be noted that Theorem [23] is derived from a special case 
(Qy = Py) of Lemma 1271 The asymptotically optimal choice 
is Qy = Py +8 \ which corresponds to (1190b of Lemma 
1271 Under Assumption |T] we can derive the bound of the 
Markov case only for that special choice of Qy, while under 
Assumption|2] we can derive the bound of the Markov case for 
the optimal choice of Qy. Here, we didn’t call several lemmas 
as theorems although they derive the asymptotic tight bound. 
This is because they are not computable form as explained in 
the beginning of Section [III] 

A. Problem Formulation 

The privacy amplification is conducted by a function / : 
X -A {1 The security of the generated key is 

evaluated by 

*[f\~\\\Pnx)Y- Pu X iVlli, (176) 

where U is the uniform random variable on {1 ,M} and 
|| • ||i is the variational distance. For notational convenience, 
we introduce the infimum of the security criterion under the 
condition that the range size is M: 

A(M) := inf A[/]. (177) 

When we construct a function for the privacy amplification, 
we often use a two-universal hash family T and a random 
function F on T. Then, we bound the security criterion 
averaged over the random function by only using the property 
of two-universality. As explained in Subsection II-EI to take 
into the practical aspects, we introduce the worst leaked 
information: 

A(M) := supE[A[.F]], 

F 


( 178 ) 
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where the supremum is taken over all two-universal hash 
families from X to {1 From the definition, we 

obviously have 

A (M) < A (M). (179) 

When we consider n-fold extension, the security criteria 
are denoted by A(M„) or A (M n ). As in the single-terminal 
case, we also introduce the quantities M(n,e ) and M(n,e) 
(cf. GM and dToTb l. 


Remark 5 Note that the security definition in ( 1 1 76b implies 
the universal composable security criterion EDI, ED. A 
slightly weaker security criterion defined by 

j?f \^P f { X) Y ~ Pjj x Qy\\i (180) 

Qy Z 

also implies the universal composable security criterion. In fact 
some literatures employs this kinds of security criteria (52), 
(26) , (531 . Since the triangle inequality and the information 
processing inequality || Qy - Py ||i < || Pjj x Qy - Pf(x)Y 111 
imply 

-j\\Pf(x)v ~ Pjj x TV||i 

<2 W P f( x ) Y “ P u x Qyh + 2^ P u x Qy - Pu x Py\\i 

= 2II P f( x ) Y ~ P u x Qi'lli + 2^ y ~ Py ^ 

<2 W P f( x ) Y ~ P u x + 2^ PjI x ~ P f{x) Y \\i, 


we have 

9 II p f(x)Y ~ p u X -FVII 1 < II p f(x)Y ~ p u X Qy||i (181) 

holds for any Qy - Thus, the two criteria differ only in constant 
factor, which means that the asymptotic behaviors of the large 
deviation regime and the moderate deviation regime are not 
affected by the choice of the security criteria. 

For the second order regime, the same fact can be shown 
as follows. The achievability part (Lemma [26] given in Sub¬ 
section lIV-Bb can be used without modification since the 
optimization over Qy is already incorporated into the bound. 
For the converse part, we need to replace H^ nin (Pxy\Py) 
with H^ in (Pxy\Qy) in Lemma 1281 given in Subsec tion llV-BI 
Then, the converse bound in Lemma 1291 given in Subsection 
IIV-BI is modified accordingly, i.e., 


AIM) > inf max 
Qy 7>0 


PxY 



Qy(y) 

Pxv{x,y) 




However, by noting the inequality 

Qy(v) 


Pxy | log 
>P X y (log 


p xy(x,y) 

1 

Px\Y{x\y) 


< 7 


< 7 - v }> - P X y {log > v 


Py{y) 


>PXY { log 


Px\y(x\v) 


< 7 — v > — e 


(182) 


for any v > 0, the choice Qy = Py turns out to be the optimal 
choice asymptotically up to Thus, the asymptotic 


behavior of the second order regime is also not affected by 
the choice of the security criteria. 

When the output size M is too large, A (M) is close to 
1 anymore. In this case, to quantify the performance of the 
output random number, according to Csiszar-Narayan |[29) , we 
focus on the relative entropy between the generated random 
number and the ideal random number as follows. 

D{P f (X)Y\\Pu X Py) =logM — H(f(X)\Y) 

=I(f(X);Y) + D{P f{x) \\Pjj). 

(183) 

Since this quantity can be regarded as a modification of the 
mutual information /(/(A); Y), we call it the modified mutual 
information. This quantity is naturally given under axiomatic 
conditions (28) . Then, we address the following quantities. 

D(M) :=inf D(P f(x)Y \\P w x Py) (184) 

D(M) :=supE[D(P F{X )YF\\Pjj x Py)] 

F 

=D(Pf(X)Yf\\ P u X Py X P F ) (185) 

where the supremum is taken over all two-universal hash 
families from X to {1 ,,M}. The reason why we consider 
such a supremum is the same as the case of A (M). 


B. Single Shot Bounds 

In this section, we review existing single shot bounds, and 
show a novel converse bound. For the information measures 
used below, see Section [II] We also introduce the following in¬ 
formation measures. For Pxy 6 V(X x y ) and Qy 
let 

H m in{Pxy\Qy) ■= - log max Px ^ (186) 

X ’V Qy{y) 

be the conditional min-entropy. Then, for Pxy G V{X x y), 
let 


H^PxyIQy) := max H min (P^ Y \Q Y ) (187) 
ry y eB‘(p X Y) 

and 

P min{PxY\Qy) ■= max H min ( p xy\Qy) (188) 

P' XY £B e (PxY) 

be the smooth min-entropy, where 
B(Pxy) := \P' XY G V(X x 7) : \\\Pxy ~ P'xy ||i < , 

B(Pxy) := \p'xy € P(X x 7) : \\\Pxy ~ P'xy 111 < • 

By using the two-universal hash family, we can derive the 
following bound. 

Lemma 25 ((25)) For any Qy £ P(y), we have 
A (M) < 2e + Me~ He min(PxY\QY)_ 

1 technically, we restrict Qy to be such that supp {Py) C supp(Qy). 
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However, the bound in Lemma [25] cannot be directly cal¬ 
culated in the Markovian chain. To resolve this problem, we 
slightly loosen Lemma [25] as follows, (cf. [28 Theorem 23] 
or ll27l Lemma 3]). 

Lemma 26 For any Qy £ 'Pty), we have 
A (M) < 

We also have the following exponential bound. 


inf 

7>0 


PXY 5 log 


Qy{y) 

Pxy(x, y) 


< 7 > + 


Similar to Lemmas [25] and [28] the bound in Lemma [3J] 
cannot be directly calculated in the Markovian chain. To 
resolve this problem, we slightly loosen Lemma [311 as follows. 

Lemma 32 For any 0 < v < 1, we have 

f p (l+ 9 (a(R))) ( A 'l 

A (M) > (1 - vfP XY \ log r rc ^ ] < a(R) \ ( 195) 

where R = log (Mu), and 9(a) and a(R) are the inverse func¬ 
tions O' (a) and a^(R) defined by ( l20l ) and (THT) respectively. 


Lemma 27 ( lTT2l l) We have 

A (M) 

o 

< min inf -M^ e~^ Hl +^ PxY \Q Y) (189) 
Qre-P(T) o< 0 <i 2 

= inf —M ttv e~irs H i+s( x l Y \ (190) 

o< 0 <i 2 

For the converse bound, the following is knowrf^l 

Lemma 28 ( ll25ll ) We have 

A (M) > min e. (191) 

ff mi„( p *r|Pr)>l°gM 

Similar to Lemma [25] the bound in Lemma [28] cannot be 
directly calculated in the Markovian chain. To resolve this 
problem, we slightly loosen Lemma [28] as follows. 


Lemma 29 We have 


A (M) > max 
7 >o 


PxY 



1 

Px\v(x\y) 




Proof: The proof is exactly the same as Lemma [21] ■ 

Although Lemmal29lis useful for the large deviation regime 
and the moderate deviation regime, it is not useful for the 
second order regime. To resolve this problem, we loosen 
Lemma [29] as follows. 


Lemma 30 We have 


A (M) > sup 


PxY 


7>0 



l 

Px\y{x\y) 



e 1 

M 


■ (193) 


Furthermore, by using a property of the strong universal 
hash family, we can derive the following converse as a 
generalization of Lemma [231 


Lemma 31 For { ( -ly}ycy such that < N < M for every 
y £ y, let fl = Uy^ytty x {y}. Then, we have 

— ( iV \ 2 

A(M) > M - — j P XY (n). (194) 

Proof: We apply Lemma 1231 to each P x \ Y {-\y) and take 
average over y. Then, we can derive the lemma since f>, ; | < 
N by the assumption. ■ 

15 See also ED for a proof that is specialized for the classical case. 


Proof: See Appendix [H] ■ 

To derive a converse bound for A(M) based on the con¬ 
ditional Renyi entropy, we substitute the formula in Propo¬ 
sition [3] in Appendix [A] into the bound in Lemma [29] for 
a = 7 = log(M/2). So, we have the following. 


Theorem 20 We have 


— log A(M) 


< mf - 

_»>o s 

t?>8(a) 




- (l-Ps)log ^1 

_ e (0(a)-0)a-0(a)Hj; +eM (X\Y)+Mf +s (X\Y)\ 


log 2 , 




096) 


where a = log(M/2), and 9(a) is the inverse function 9^(a) 
defined by (IT 8 l) . 


Proof: Theorem [20] can be shown by the same way as 
Theorem QT] with replacing the role of Lemma Eli by Lemma 


To derive a converse bound for A (M) based on the condi¬ 
tional Renyi entropy, we substitute the formula in Proposition 
[3] in Appendix lAl into the bound in Lemma ETI for v = So, 
we have the following. 


Theorem 21 We have 


< 


-log A (M) 

inf - 

_ s >° s 

9>0( a (R)) 


(1 + s)9l H 


'l+e,l+0(a(fl)) 


(x|y) 


- - (! + s) log (1 - e c *■») 

+ 2 log 2, (197) 

where R = log(M/2), 


C 2 , n :=[0(a(R)) - 9}a(R) - 9(a(R))Hl +g{a{R)) (X\Y) 

+ M 1+ g tl+elam (X\Y), 

and 9(a) and a(R) are the inverse functions 9^(a) and af(R) 
defined by <[20l) and (fTil respectively. 
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Proof: Theorem |2T] can be shown by the same way as 
Theorem [8] with replacing the role of Lemma [24] by Lemma 
[32] ■ 

Finally, we address the modified mutual information rate 
(MMIR). As the direct part, we have the following theorem. 

Theorem 22 The maximum modified mutual information 
D(M) among two-universal hash family is bounded as 

D(M) < \ log(l + M e e- 0Hl+eWY) ). (198) 

6 

Proof: Lemma 10 of |47l shows that any two-universal 
hash function F satisfies the relation 

E< 1 + M e e~ eH ^ x (199) 

which implies that E[logM — H(F(X\Y))] < E[logM — 
H 1+S (F(X)\Y)) < l-logE(M s e~ sHl +^ x '>\ Y '>) < 

| log(l + M s e~ sHl +“^ x ^). m 

As the converse part, we have the following theorem. 

Proposition 2 

D(M) > log M- H(P X ) (200) 

Proof: Inequality (12001 ) follows from the inequality 

H(X\Y) > H(f(X)\Y). U 


where 9(a) is the inverse function 0l(a) defined by ( l46l >. and 

Cs,n :=(« - 1) ((0(R) - 6)R - 9(R)Hff e{R) (X\Y) 

+ 9H^ V -(X\Y)^j +5 2 , (203) 

51 :=(l + s)S(fl)-£((l + s)0), (204) 

5 2 :=(9(R) - 9)R - 5(9(R)) + 5(9). (205) 

Proof: Theorem [24] can be shown by the same way 
as Theorem QT| with replacing the roles of Lemma |2T] and 
Proposition [3] in Appendix [A] by Lemma [20] and Proposition 
[4] ■ 

Next, we derive tighter bounds under Assumption [2] To 
lower bound — logA(M n ) by the upper conditional Renyi 
entropy of transition matrix, we substitute the formula for 
the upper conditional Renyi entropy given in Lemma [9] into 
the bound in Lemma l27l we have the following achievability 
bound. 

Theorem 25 Suppose that a transition matrix W satisfies 
Assumption [2] Let R := j- log M n . Then we have 


C. Finite-Length Bounds for Markov Source 

Since we assume the irreducibility for the transition matrix 
describing the Markovian chain, the following bounds hold 
with any initial distribution. To lower bound — log A (M n ) by 
the lower conditional Renyi entropy of transition matrix, we 
substitute the formula for the lower conditional Renyi entropy 
given in Lemma [ 8 ] into the bound in Lemma [27] for Qy" = 
/V" , we have the following achievability bound. 


Theorem 23 Suppose that a transition matrix W satisfies 
Assumption!]] Let R := L log M n . Then we have 


- log A (M n ) 

—9nR + (n- 1 )9H\f g (X\Y) + 5(9) 

sup - f- - 

o<e<i 1 + 9 


l°g(3/2). 

( 201 ) 


To upper bound —logA (M n ) by the lower conditional 
Renyi entropy of transition matrix, we substitute the formula 
for the tail probability given in and Proposition [4] with a = R 
into the bound in Lemma [29] with 7 = nil. we have the 
following converse bound. 


Theorem 24 Suppose that a transition matrix W satisfies 
Assumption □ Let R := Llog(M n /2). For any a < R < 
H w (X\Y), we have 


- log A (M n ) 


1 


< inf - 

_3>o s 
0>9(a) 


-H. 

+ log 2 , 


(n - i)(i +(Y|y) 


( 202 ) 


- log A (Mn) 

~9nR + (n- 1)9 (X\Y) 


> sup 
o<e<i 


1 + 9 


-m- log(3/2). 

(206) 


To upper bound —logA (M n ) by the upper conditional 
Renyi entropy of transition matrix, we substitute the formula 
for the tail probability given in and Proposition 0 in Appendix 
IaI into the bound in Lemma 1.3 l[Ll we have the following 
converse bound. 


Theorem 26 Suppose that a transition matrix W satisfies 
Assumption [2] Let R be such that 


(n - 1 )R + ^(1 + 9(a(R)))(a(R) - S(9(a(R))))j 
= log(M„/2). (207) 

If R(a) < R < H n (X\Y), then we have 


— log A(M, 
< inf - 

s>0 g 
9>0(a(R» 


(n-l)(l +s)e[H™ h +w)) (Y|r) 


- <(!+.,>(V| y ))+J. 


— (1 + s) log (l — e C4 ’ n ) 


2 log 2 , (208) 


16 We cannot apply Proposition |4] here since we cannot apply Lemma [34] 
for </>(p; Px n Y n \Qyn P ^)- Instead, we need to apply Lemma ITo] 
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where 6 (a) and a(R) are the inverse functions 0l(a) and 
af(R) defined by ( 156} and ( 158} respectively. 


C 4 ,n :=(n-l) (0(a(R)) - 9)(a(R)) 


-0(a(R))H^ a(R)) (X\Y) 


I n zjW 

l+0,l+0(a(R)) 


(X\Y) 


S 2 (209) 


Si :=(l + s)((9,O(a(R)))-(((l + S )0,O(a(R))), (210) 
62 :=(0(a(H)) - 9)(a(R)) - ((9(a(R)),0(a(R))) 

+ {{0,0(a(R))). (211) 


where 9(a) is the inverse function 9^ (a) defined by ([46} . 

Under Assumption [2] taking the limit in Theorems [25] and 
l26l we have the following tighter bound. 


Theorem 30 Suppose that a transition matrix W satisfies 
Assumption [2] For R < H w (X\Y), we have 


lim inf-log A ( 

n—¥ oo n 


~nR 


) > sup 
o<e<i 


-OR 


-9H\f e (X\Y) 
1 + 9 


(217) 


On the other hand, for R(a) < R < H w (X\Y), we have 


Proof: See Appendix [I] ■ 

We derive finite-length bounds for modified mutual infor¬ 
mation rate under Assumption [T] by substituting the formula 
for the lower conditional Renyi entropy given in Lemma [8] 
into the bound in Theorem 1221 

Theorem 27 When R - (X\Y) > 0, for 9 e [0,1], we 

have 

D(e nR ) < nR-(n-l)H^(X\Y)) + l -(\og2 -5(0)))- 

( 212 ) 

Proof: Theorem [27] can be shown as the same way 
as Theorem Qj] by replacing Il\^ e (X) and Theorem [9] by 
Hf+e (X |Y) and Theorem [22l respectively. ■ 

To lower bound D(e nR ) by the lower conditional Renyi 
entropy of transition matrix, we substitute the other formula 
for the lower conditional Renyi entropy given in Lemma [8] 
into the bound in Proposition [2] we have the following bound. 

Theorem 28 For 9 £ [0,1], we have 

D(e nR ) > nR - (n - 1 )H^(X) + (213) 

6 

Proof: Theorem [28] can be shown as the same way as 
Theorem [14] by replacing Il) v _ 0 (X) and Proposition Q] by 
(X\Y) and Proposition [7] respectively. ■ 


D. Large Deviation 

We can show the following theorem in the same way as 
Theorem [15] by taking the limit in Theorems [23] and [24] with 
use of Lemma [ 6 ] 


Theorem 29 Suppose that a transition matrix W satisfies 
Assumption!]] For R < H W (X\Y), we have 


lim inf-log A (e nR ) > sup 

n-roo n o<e<i 


—OR + 9H (X|y) 

r To 


(214) 


On the other hand, for a<R< H W (X\Y), we have 
lim sup-log A (e nR ) 

n—f oo R 

< - 0(R)R + 9(R)H\’™ {r) (X\Y) (215) 

= sup-OR+ 0Hf^(X\Y), (216) 

0 <8 


lim sup-log A (e nR ) 

n—¥ oo Tl 


< ~ 0(a(R))a(R) + 9(a(R))Hlf e{a{R)) (X\Y) 

- 6 R + 0Hlf g (X\Y) 

= SU P - T~TZ -! 


(218) 

(219) 


where 6 (a) and a(R) are the inverse functions (F(a) and 
af(R) defined by ( 156} and ([58} respectively. 


Due to Lemma [7] the lower bound (12 1 7b and the upper 
bound (12181) coincide when R is not less than the critical rate 
Rcr- 

Proof: (12061) in Theorem [25] yields (12171) . Lemma [7] 
guarantees (12191) . So, we will prove (12181 ). 

We fix s > 0 and 9 > 9(a(R)). Then, (12081 ) implies that 

lim --logA (M n ) 

n—> oo Ti 

1 + , WR)) Wr)j 

( 220 ) 


< 1 h w 


l+6,l+6(a(R)) 


Similar to (1 152b . taking the limits s —> 0 and 9 —> 9(a(R)), 
we have 


1 + S; 


°\K + ~e,i + e {am ( x \ Y ) 


- H 


w 


d9H^ 0 1+e ( a ( iJ )) 

~ e - do — 


l+(l+s)0, l+0(a(.R)) 

(X\Y) 

9=e 


(X\Y) 


^ H Z§,i + o MR)) ( X \ Y ) ( ass ^°) 


- 9(a(R )) 


dOH™ 0 ' 1+e(a(R)) (X\Y) 


d 6 


9—6(a(R)) 


+ 6 (a(R))Hl$ {aiR)) (X\Y) (as 9 ->• 0(a(R ))) 

= 0(a(R))a + 0(a(R))Hl$ {a(R)) (X\Y). (221) 

where (a) follows from ([56} . Hence. ( 1221b and (1220b imply 
that 

lo S < 9(a(R))a + 0(a(R))Hl^ {a{R)) (X\Y), 

( 222 ) 


which implies (1218b . 
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E. Moderate Deviation 


V. Discussion and Conclusion 


Taking the limit with R = H W (X\Y) — n 4 S in Theorem 
[23] and Theorem l24l we have the following. 


Theorem 31 Suppose that a transition matrix W satisfies 
Assumption Q] For arbitrary t £ (0,1/2) and 5 > 0, we have 


lim — 

n—>oo Tl 


= lim — 

n—¥ oo Tl 


^\og\(e 

1^2t lo S S ( f 


iH w (X\Y)-n 


,nH w (X\Y)-n 




s 2 


2V W (XI Y)' 
(223) 


Proof: This theorem can be shown by the same way as 
Theorem [171 by replacing ( 11381 ) and ( 1 1 391 ) by ( 12011 ) and (1202b . 
respectively. ■ 


F. Second Order 

By applying the central limit theorem to Lemmas [26] and 
[30l and by using Theorem [2] we have the following. 


Theorem 32 Suppose that a transition matrix W satisfies 
Assumption Q] For arbitrary e £ (0,1), we have 

logM(n,e)-nH w (X\Y) 
hm -—- 

n—yoo yjn 

- lim l °gM(n,e)-nH w (X\Y) 

n—yoc yfn 

=yJv w (X\Y)$~ 1 (e). (224) 

Proof: The central limit theorem for Markovian process 
fidl . l48l , |[49l f35l Corollary 6.2.] guarantees that the random 
variable (log P x ™\Y n (X n \Y n ) — nH w (X\Y)) /-fn asymptot¬ 
ically obeys the normal distribution with the average 0 and 
the variance \/ w (X\Y). This theorem can be shown by the 
same way as Theorem [18] by replacing the roles of Lemmas 
[T8l and [S] by those of Lemmas [26] and [30] with Qy = Py, 
respectively. ■ 


G. Modified Mutual Information Rate (MMIR) 

Taking the limit in Theorems [27] and [28] we have the 
following. 


Theorem 33 Suppose that a transition matrix W satisfies 
Assumption [7] The modified mutual information rate (MMIR) 
is asymptotically calculated as 

lim -D(e nR )= lim -D(e nR ) = [R - H w (X\Y)] + . 

n—>■ oo Tl n—f oo Tl 

(225) 

Proof: Theorem [33] can be shown as the same way as 
Theorem [T9] ■ 


In this paper, we have derived the non-asymptotic bounds 
on the uniform random number generation with/without infor¬ 
mation leakage for the Markovian case. In these bounds, the 
difference between A (M) and A(M) is asymptotically negli¬ 
gible at least in the moderate deviation regime and the second 
order regime. The same relation holds between D(M) and 
D(M). Hence, we can conclude that it is enough to employ 
any two-universal hash function even for the Markovian case. 

Here, to discuss the practical importance of non-asymptotic 
results, we shall remark a difference of the uniform random 
number generation from channel and source coding. When we 
construct a practical system, we need to consider two issues: 

• How to quantitatively guarantee the performance, 

• How to implement the system efficiently. 

The uniform random number generation do not have to care 
about decoding complexity although the coding problems 
requires decoding, which requires huge amount of calculation 
complexity. Furthermore, it is also known that universal 
hash functions can be constructed by combination of Toeplitz 
matrix and the identity matrix. This construction has small 
amount of complexity and was implemented in a real demon¬ 
stration Hence, our non-asymptotic results can be directly 
used as a performance guarantee of a practical system even 
when the source distribution has a memory. 

Recently, Tsurumaru et al DU proposed a new class of hash 
functions, so called e-almost dual universal hash functions. 
Then, the recent paper tna invented more efficient hash 
functions with less random seeds, which belong to e-almost 
dual universal hash functions. Hence, it is needed to extend our 
result to e-almost dual universal hash functions. Fortunately, 
another recent paper |[28l has already shown similar results 
with e-almost dual universal hash functions in the i.i.d. case. 
So, it is not so difficult to extend the results in l28l to the 
Markovian case. 

In this paper, we have assumed that the transition ma¬ 
trix describing the Markovian chain is irreducible. When 
the transition matrix has several irreducible components, we 
need to consider the mixture distribution among the possi¬ 
ble irreducible components, which is defined by the initial 
distribution. As discussed in ll54l Theorem 1], in the finite 
state space, the asymptotic behavior of the (conditional) Renyi 
entropy is characterized by the maximum (conditional) Renyi 
entropy among the possible irreducible components, which 
depend on the initial distribution. Hence, for large deviation 
and moderate deviation, the exponential decreasing rate of 
the leaked information can be evaluated by the minimum rate 
among the possible irreducible components. On the other hand, 
in the case of the mixture of the i.i.d. case, when we fix the 
first and second orders of the coding rate, the limit of the 
decoding error probability is given by the stochastic mixture 
of the Gaussian distributions corresponding to the i.i.d. sources 
J55). So, for the second order analysis for the Markovian 
case, we can expect the similar characterization by using the 
stochastic mixture of the Gaussian distributions corresponding 
to the irreducible components. Such an analysis is remained 
for a future study. 
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Appendix A 
Tail probability 

In converse proofs, we use some techniques to bound 
tail probabilities in (34) . 11351 . For this purpose, we need to 
translate some terminologies in statistics into terminologies 
in information theory. In this appendix, we introduce some 
terminologies and bounds from (34) , (35). For proofs, see (34) , 

135)- 

A. Single-Shot Setting 

Let Z be a real valued random variable with distribution P. 
Let 

ftp) ■= log E [e pz ] = log E P(z)e pz (226) 

Z 

be the cumulant generating function (CGF). Let us introduce 
an exponential family 

P p (z) := P{z)e pz ~^ p \ (227) 

By differentiating the CGF, we find that 

<t>'{p) = E p [Z\~Y J Pp{z>- (228) 

Z 

We also find that 

<Ap)=E P p( 2 )(“- E p^) 2 - (229) 

z 

We assume that Z is not constant. Then, ( 1229b implies that 
ftp) is a strict convex function and ft(p) is monotonically 
increasing. Thus, we can define the inverse function p(a) of 

ft(p) by 

ft(p(a)) = a. (230) 

Let 

D 1 +S (P\\Q) := - log V P{z) 1 +s Q(z)~ s (231) 

z 

be the Renyi divergence. Then, we have the following relation: 


Similarly, for any a < E[Z], we have 
— log P{Z < a} 

1 


< inf - 

s>0 g 
p £ R, er > 0 


ftft + s)p) - {l + s)ftp) 

- (1 +s) log ^1 - e -[ <ra- 0 (p+a)+ 0 (p)]^ 
0((1 + s)p) - (1 + s)ftp) 

— (1 + s) log ( 1 — e -[(p( a )-/ 5 ) a “ < KP+ ,T )+'K/5)] 


< inf - 

s >o S 
p<p(a) 


(235) 


. (236) 


B. Transition Matrix 

The discussion in this and the next subsections is a gen¬ 
eralization of that for the lower conditional Renyi entropy 
Y) in the following sense. In these subsections, 
the set Z , and the functions g, g, and 0 (p) are addressed. 
The set Z is the generalization of X x y, and the functions 
g, g, and ftp) are the generalizations of logfF — logWy, 
log Px x Y t ~ log P Yl , and — 9Hf^(X |F), respectively. Under 
this generalization, the same notation has the same meaning 
as for the lower conditional Renyi entropy (X\Y). 

Let {W(z\z')}( ztZ ')<zz 2 be an ergodic and irreducible tran¬ 
sition matrix, and let P be its stationary distribution. For a 
function g : Z x Z —> R, let 

E \g] :='52 p (z')W(zW)g(z,J). (237) 

z,z' 

We also introduce the following tilted matrix: 

W P (z\z') := W(z\z')e p9(z ’ z,) . (238) 

Let A p be the Perron-Frobenius eigenvalue of W p . Then, the 

CGF for W with generator g is defined by 

<Kp):= logA p . (239) 


sD 1 +s (Pp\\P p ) = </>(( 1 + s)p - sp) - (1 + s)(j)(p) + sc/)(p). 

(232) 

The following bounds on tail probabilities will be used later. 

Proposition 3 ( it35l Theorem A.2]) For any a > E;Z], we 
have 


Lemma 33 The function 0 {p) is a convex function of p, and 
it is strict convex iff. </>"( 0 ) > 0 . 

From Lemma 1331 0'(p) is monotone increasing function. Thus, 
we can define the inverse function p{a) of <f>'(p) by 

ft (p{a)) = «■ (240) 


— logP{Z > a} 


s>0 s 
p £M, cr > 0 


< inf - (j){{ 1 + s)p) - (1 + s)(j>(p) 
(1 + s) log ^1 - e -t 
cj){{l + s)p) - (1 + s) 4 >{p) 


< inf - 

s>0 g 
p>p(a) 


C. Markov Chain 

Let Z = {Z n }% L 1 be the Markov chain induced by W(z\z') 
and an initial distribution Pz x . For functions M 

and g : Z -0- R, let S n ■= Yft .=2 9{ z i, z i-i) +g(Zi). Then, 
( the CGF for S n is given by 

0 n (p):=logE [e pS "]. (241) 

We will use the following finite evaluation for <fi n (p )■ 


- (1 + s) log 


_ e -[(rpW)o-«p+<r)+«p)] ] 


(234) Lemma 34 ((35] Lemma 5.1]) Let v p be the eigenvector of 
Wj with respect to the Perron-Frobenius eigenvalue X p such 
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that min Z v p (z) = 1. Let w p (z) := Pz 1 {z)e p ^ z ' ) . Then, we 
have 


Appendix B 
Proof of Lemma[T21 


( n - 1 )f(p) + £ 0 (p) < (j>n{p) < {n - 1 )<t>{p) + £*(p), ( 242 ) 


We first prove the following lemma. 


where 


Lemma 36 Suppose that x\ = x n . Then, we have 


8 dp) ■= \og(v p \w p ), (243) 

8^(p) := log(v p \wp) — log max ^(z). (244) 

From this lemma, we have the following. 

Corollary 1 For any initial distribution and p £ R, we have 

lim 4> n (p) = 4>{p)- (245) 

n—f oo 

The relation 

lim -E [S n \ = 0'(O) = E[g] (246) 

n—too Tl 

is well known. Furthermore, we also have the following. 

Lemma 35 For any initial distribution, we have 

lim ^Var [S'n] = </>"(0). (247) 

n—2 oo 77 

Finally, we also use the following bound on tail probabili¬ 
ties. 


Proposition 4 ( ||35l Theorem 7.2]) For any a > E [g\, we 
have 

-logPjSV, > an} 

< inf - 


s>0 § 

p>p(a) 


(n - 1)(<K(1 + s)P ) - (! + s)</>(p)) + <5i 

(1 _p s ) l 0 g _ g(^—!)[(P-p(o))o+ 0 (p(a))- 0 (p)]+< 5 2 


(248) 


where 


<5i := ^((l+s)p) - (1+ s)^(p), (249) 

82 ■= (p-p(a))a + Sdp{a)) ~ ^{p). (250) 

Similarly, for any a < E[g], we have 
- log P{S n < an} 

< inf ^ {n — 1)^</>((1 + s)p) — (1 + s)</>(p)j + 5\ 

_ (1 -)_ s ) i 0 g _ e ( n - 1 )[(p-p(“))“+<Kp( Q ))-0(p)l+<52 

( 251 ) 


(252) 

2=2 

Proof: When cycle c = {(tci, x 2 ),..., {x n -i, x n )} is a 
Hamilton cycle, the statement is obvious from the definition 
of H^(X). Otherwise, there exists a Hamilton cycle <:' = 
{(xj,Xj + 1 ),..., {x k -i,x k )} in c. Then, we have 

n 

(xi\xi-i) 

i =2 

= W{x\x') W{ x\x') 

(x', x)Gc\c' (x / ,x)Gc / 

< W(x\x')e- {k - j)H ™ {x) . (253) 

(x',x)Gc\c' 

Since c\c' is also a cycle, by repeating this procedure, we have 
the statement of the lemma. ■ 

We now go back to the proof of Lemma [12] To prove the 
left hand side inequality of (f88l> . we need to upper bound 

max x » P X n(x n ). 

For a given x n satisfying the relation x\ ^ x n , we 
chose an extension x m = (x\,...,x m ) of x n as fol¬ 
lows. (1) x m is chosen to be aq. (2) The path c = 
{(■t'n 1 ^n+ 1)5 • • • 5 (n-m—1 1 %m)} from x n to x rn is chosen as the 
Hamilton path argmax Xb ) eS W{x} > \x a ). Then, we have 


AP X n{x n ) <P x ™{x m ) < maxP^We-t™- 1 )^™ 

X 

<ma xP Xl (i)e- ( "- 1)H “ m , (254) 

X 

where (a) follows from Lemma [36] For a given x n satisfying 
the relation xi = x n . Lemma 1361 implies that 

Px n {x n ) < maxPxiMe - ^" 1 ^”^'. (255) 

X 

Since A < 1, we have the left hand side inequality of (IMt in 
the both case. 

To show the opposite inequality, let x = argmax P Xl (x). 

X 

Assume that x x*. Then, let x m be the sequence such 
that it start with x, the first part constitutes a Hamilton path 

c 0 = argmax]^ XiiiX!) ) e£ FF(2;b|:E a ) and then the sequence 

corresponding to the cycle c* is repeated \(n — |co|)/|c*|] 
times. Then, we have 

ma xP X n(x n ) > ma xP X m(x m ') > P X m(x m ) 

X n X rn ' 

> P Xl (x)Ae-^ n ~\co\)/\c-tt\c*\HZ { x) 

> P Xl (x)Ae-« n ~ l«=-D+l=*l>^W 

> P Yl (x)Ae- {(n - 1)+|c * l}ff - (x) . (256) 
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Assume that x = x*. Then, we construct in the same way 
with omitting the first part. So, we have 

maxPjfn (x n ) > rnaxP X m(x m ') > P X m(x m ) 

X n X m/ 

>P Xl (x)e-WIW H ZW 

= P Xl (x)e~ {n+lc * l}H ~ (x ) (257) 

Combining (1256b and (1257b . we have the right hand side 
inequality of (1881) . ■ 


Appendix C 
Proof of Lemma ITT1 

To prove (l86l >. we use the limiting results (l68l > and ( 19 1 b . 
More precisely, we have 

lim HK e (X) = lim lim -H 1+g (X n ) 

0 —>oo 0—>-oo n —¥oo n 

= lim lim -H 1+e (X n ) = lim -H 00 (X n ) = H%(X). 

n—¥oo 9—yoo n n—>oo n 

(258) 

To complete the proof, we need to show that the order of the 
limits can be changed, which is justified if 8(9)/6 and 8_(9)/9 
are bounded. For this purpose, it suffices to show wg(x) < 
M 1+e and vg(x) < M 1+e for some constants M, M because 
these relations imply that 


Thus, we have 


Vg(l) = 


(\ S ) m V 9 (l) (a) T, x/ (W^) m (l\x')v9(x') 


(Xe) m ve(\X\) E x ,(W?) m (\X\\x')v e (x') 


l\x')v e (x') < ^ (W?r(l\x') 


(W^(\X\\l)v e (l) 


J (w/>(|*||i) 


Wyy |*| m 1 (max^j W(x\x)) 


c’ 


mm x,x W(x\x 

W(x\x)>0 


ra(l+0) 


=i*r 


(max^i W(x\x)) r 


min x,s W(x\x 

W(5|x)>0 v 1 


< 


\ X\ m (max^g W(x\x)) m {1+e) 
I min x,x W(x\x 

\ W(x\x)>0 


1+9 


1+0 


(262) 


where (a) and (b) follow from (1259b and the pair of (1260b and 
I, respectively. Hence, we have the desired bound. ■ 


Since 


Appendix D 
Proof of Lemma[T51 

1 < Ex 1+S < W we have 


-i log |*| 

< : + \ log M l+e < ^ log M 1+e . 

The former is obvious. To prove the latter, without loss of 
generality, we can assume that * = {1, 2,..., |*|} and that 
ve(l) > • • • > ue(|*|) = 1. Since Wj is irreducible, we can 
fix an integer to such that (W/ 1 ) m (|*||l) > 0. Since vg is an 
eigenvector, we have 

^2(Wg) m (x\ x')vg(x') = (A g) m vg(x). (259) 

x' 

On the other hand, we have 


(WJT(\\x') 


= ^ Wg(l\Xm-l) ■ ' ' Wg (x2\X\)Wg (Xl\x') 

Xi,X2,-..,Xm-l 


51 *| m—1 

= |*| m “ 1 


( maxi*/ 1 (x\x) 

V x,x 


= 1*1 m ~l 


/ m(l+0) 

I ma,yiW(x\x) ] 

V x i x ) 


) m 


(260) 


Since there exists, at least, one sequence xi,x%,... ,x m -i 
such that l*/ 1 (|*||a; m _i) • • • W/ 1 (a; 2 |a;i)l*/ 1 (a;i|l) > 0, we 
have 


(f* e T r(i*iii) 

= J2 Wj'(\X\\x m - 1 )---Wj’{x 2 \x 1 )Wi’(x 1 \l) 

XI ,X2,...,Xm-l 


i(l+0) 


> I min Wj(x\x) I = min W(x\x) 


W(S|a:)>0 


W(5|x)>0 


(261) 


Kg(y\y') 

=Wy(M)TW) (V. ( WX ' X ‘'~~ 


>W Y (y\y')T(y\y') 


l+6>\ T+e 


(263) 


as 9 —> oo. Thus, by the continuity of eigenvalues with respect 
to the matrix, we have Kg -+ k 0 c , which implies (1 1 00b . ■ 


Appendix E 
Proof of Theorem[6] 

To prove ( 1 101 b . we note that P X "\ Y n can be written as 

P X n\ Y n(x n \y n ) 

n 

=P Xl \ Yl (xi\yi)Y[w X \ X ’, Y ’, Y ( x i\xi-i,yi-i,yi)- (264) 
i =2 

Thus, in a similar manner as the proof of Lemma fl2l we can 
derive an upper bound and a lower bound on H/ 0 (X n \Y n ), 
from which we can derive (11011) . 

On the other hand, to show ( 1 1 02b . we have 

e ~ H t,(x n \Y n ) 

= ^2 p Y"(y n ) maxP X n|yn (x n \y n ) 
y n 

n 

=P Y i(yi) max p x 1 ivy (xi|Vi ) TT W Y (yi\y l -1 )T( yi \ yi _ : i).. 
x 1 

2=2 

(265) 

Thus, in a similar manner as the proof of Lemma [9] shown in 
|f30i Lemma 10], we can derive an upper bound and a lower 
bound on H/ X} (X n \Y n ), from which we can derive (11021) . 
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Appendix F 
Proof of Lemma [24] 


Let 


n= r ;log ?£w So ^ 

Then, for p < 1, we have 

|fi| <^e (1_p) ( a_los ^) 


(266) 


X(zQ 


< Px{xf- p e^-^ a = e 0- P )*+<Kp-,P), (267) 


SI = < x n : log ■ 


Px-(x n ) 

Then, for any p < 1, we have 


< an 


|J)| < g(l -p)an+c/>(p-,Pxn) 

— e (l+9)an-eH 1+e (X ") 

< e (. l + e ) an -(. n - l ) 8 H ™ 9 ( X ')- 6 { 0 ) ( 2 70) 


Sly = < x : log 


4 1+ %) 

Pxy(x, y) 


< a 


(273) 


Then, for any 9 > — 1, we have 

( pfi+S), , 

(1+0) a—log p* (X ' V) 


1+0 


l^yl — e v 

< P (i+e)o Pxv{x,y) 

* 4 1+ %) 1+9 


(L)g(l+0)a 


E 


1+0 


Pxv{x,y) 

i n 1+0 


where </>(p; P) is defined in (1226b . Here, we set p = p(a) and 
a = a(P). Then, by noting (l50b , we have 

|S1| < e R = Mu. (268) 

Thus, by using Lemma 1231 we have (1131b . ■ 


Appendix G 
Proof of Theorem[T21 

The proof proceed almost in a similar manner as the proof 
of Lemma [24] Let 


E x ''PxY(x",y)W 

^ e (i+0)a-0irj' +e (x|r) ( 274 ) 

where (a)and ( b ) follow from (ITU and (ITOb , respectively. 
Thus, by setting 9 = 9(a) and a = a(R), and by noting 
(ED, we have 


| Sly | < e R = Mu. 

Thus, from Lemma [311 we have (1195b . 


(275) 


Appendix I 

Proof of Theorem[26] 

The proof proceed in a similar manner as the proof of 
Lemma [32] Let 


( P^ 1+ Ew n ) 

f2yn = J s" : log ■ Yn (V } 


< an 


(276) 


(269) 


y ^ P X n Yn ( X u, y n) j 

Then, for any 9 > — 1, we have (cf. the proof of Lemma [32l) 


I Sly n | ^ e 


(l+e)an-8Hl +g (X ri \Y n ) 
B (l+e)an-(n-l)eH^(X\Y)-(l+e)i(e) 


where we changed variable as p = —9 and used Lemma [8] 
Here, we set 9 = 9(a) and a = a(R). Then, by noting ( [50b . 
we have 

|ft| < g(n-l)fl+{(l+0(a(7t)))o(R)-5(0(o(fl)))} = ^ ^ 

Thus, by using Lemma l23l we have 

A (M n ) > l -P xn |log Px l {xn) < a(B)"} ■ (272) 

Finally, by using Proposition [4] and changing the variable as 
p = —9, we have the assertion of the theorem. ■ 


Appendix H 
Proof of LemmaI321 


< g(.J.-t- pJ im-171-1 jo.n£_j_ 9 iyi|x J-(i-t-njtI.BJ ( 211 ) 

where we used Lemma 0 in the inequality. Here, we set 9 = 
9(a) and a = a(R). Then, by noting ( l58l ). we have 

|Sly„| < g(n-l)fl+{(l+0(a(it)))(a(it)-|(0(o(ii))))} 


Mn 

2 

Thus, by using Lemma [311 we have 


1 


Pi 1 „+0 (a(ii))) (y n ) 


A (M n ) > -P X nyn \ lOg ^2 -—— 

4 P X n Y «( x n , y n ) 


(278) 


< a(R)n } (219) 


Here, we denote the CGF with Z = log P ^(x y) by 
<f>(0; Pxy\Qy)- Then, we have 

9Hl +e (P XY \Q Y ) = P XY \P£ +0{a{m ). (280) 

Applying (1235b of Proposition[3]to the random variable Z = 

ptl+fl(oTS))) , Y s 

lo § Y P XY (X,Y) ’ we have 


p(l+0(o(7J))) i n \ 
-l0gP XnY n{\0g Y " [V ’ 


Px n Y n (x n :y n ) 


< a(R)r 


< inf - 

s>° g 
peR,cr>0 


4>((l+s)p-,PxnYn\P£+ e{aim ) 

-(1 +s)(j)(p; P X n Y n\P Y + e( - a{R))) ) -(1 + s) log (l - e Ch ) 


Let 
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where 


Ch := — 


aa - </>(p + a; P X n Y n\PfJ e{am) ) 


■ P- 


X n Y r 


\P 


(1 +9(a(R))), 


yn 


We choose the variable p to be —9 and restrict the variable 
c to be 9 — 9(a(R)) with the condition 6 > 9(a(R)). Then, 
we use (1280b and Lemma [TOl Hence, we have the assertion of 
theorem. ■ 
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